You would first read the file using fopen() and fread() and put it in a string.
Then you would use eregi(); and explode(); to find out all the <href="link.htm">some value</a>
Then you would further strip out everything else and get the link...
Im not going to write the script. I am too tired. I hope this helps though.
and this process is call WEB SPIDERING 🙂