Ok, here is code that works (mostly). I tried this out on a copy of slashdot's index.html. It found 137 links, and looks like thats all of them but the ones that have embedded tags in them. Like the banner add at the top.
<?php
error_reporting(15);
print "Opening test.html:<br><br>\n\n";
$fp = fopen("test.html", "r");
$text = "";
$x = 0;
print "Parsing test.html<br>\n";
while ($line = fgets($fp, 2000)) {
print "Parsing line ".++$x."<br>\n";
$text .= $line;
}
print "<br>Pattern Matching:<br>\n\n";
$pattern = "|(<[Aa] [HhRrEeFf][ a-zA-Z0-9/\"'.?&=:,-]>";
$pattern .= "[ a-zA-Z0-9/\"'.?&=:,-]</[Aa]>)|";
preg_match_all($pattern, $text, $matches);
print "<b>".count($matches[1])." Matches Found<b><br><br>\n";
for ($i = 0; $i < count($matches[1]); $i++) {
print "Link Found: ".$matches[1][$i]."<br>\n";
}
?>
I hope this helps out.