Hey man,
I have sucessfully harvested all the links on the yahoo home page, oh and just for kicks did u know that they have 184 links on their home page. 🙂 The bad news is I have not really had too much time to work with the expressions to be able to grab the links on any page. Here is my code though maybe it will help.
$url = "http://www.yahoo.com";
$website = fopen($url,"r");
$page = fread($website,25000);
$page = htmlspecialchars($page);
fclose($website);
$match = array();
$spiderMain = preg_match_all("/a href=\/?[a-z0-9A-Z]+\/?+/i",$page,$match);
$mainArray = $match[0];
$mainCount = count($mainArray);
for($x = 0;$x < $mainCount;$x++){
$firstLevel = substr($match[0][$x],7);
$firstLevel = str_replace("http","",$firstLevel);
echo $firstLevel . "<br>";
}
echo $mainCount;
sorry I was messing with it a bit to make it work with other sites so it is not grabbing the full sub path of yahoo.