Can anyone help me with this one? I am trying to extract all references to files and links in an HTML file so that I can reformat them.
For example, if the HTML code has this in it:
<a href="page101.htm"><img src="images/home.gif"></a>
I would like to extract all content between the quotes of href and src, in the above example I would get something like:
href_array[0]="page101.htm";
src_array[0]="images/home.gif";
The reason I want to do this is so that I can run through the file and replace references with the base URL. For example, href_array[0] would translate to:
href_array[0]="http://www.domain.com/page101.htm";
Any one good with regular expressions?
Thanks,
Chris