Apologies for my dense-ness. I am trying to extract from a html page a list of all the hyperlinks, images and any un-tagged text. But im stuck on the first hurdle.
To get the url I am trying:
$string = "<div align=\"left\"><a href=\"http://www.cjindustries.co.uk\" target=\"_parent\"><img src=\"b_home.jpg\" alt=\"[Home]\" border=\"none\"></a><a href=\"index.html\"><img src=\"b_catl.jpg\" alt=\"[Catalog]\" border=\"none\"></a></div>";
preg_match_all("/a href=\"(.*)\"/", $string, $out)
But this matches the entire string , giving me 2 results:
Array
(
[0] => Array
(
[0] => a href="http://www.cjindustries.co.uk" target="_parent"><img src="b_home.jpg" alt="[Home]" border="none"></a><a href="index.html"><img src="b_catl.jpg" alt="[Catalog]" border="none"
)
[1] => Array
(
[0] => http://www.cjindustries.co.uk" target="_parent"><img src="b_home.jpg" alt="[Home]" border="none"></a><a href="index.html"><img src="b_catl.jpg" alt="[Catalog]" border="none
)
)
I want the matched string to end at the " after the url, however as you can see it ends at the LAST " on the line.
Can someone tell me where I am going wrong?