I created a crawler tha crawls websites and stores certain data from them. It works great for most sites, but when crawling a few sites it errors out, giving me the Max Execution Time error for the lines:
"while ($pos < strlen($result)-20) {"
and
"if (eregi("HREF[ \t\n\rv]=[ \t\n\rv][\"']([\"'])[\"']*",$tag,$regs)) {"
Can you see anything thats wrong with the following code that could cause problems with some websites?
$urls = array();
$links = array();
$pos = 0;
while ($pos < strlen($result)-20) {
$pos = strpos($result,"<",$pos);
$pos++;
$lastpos = strpos($result,">",$pos);
$tag = substr($result,$pos,$lastpos-$pos);
if (!strcasecmp(strtok($tag," "),"A") or !strcasecmp(strtok($tag," "),"AREA")) {
$pos = $lastpos+1;
$linkpos = $pos;
$pos = strpos($result,"<",$pos);
if (eregi("HREF[ \t\n\rv]*=[ \t\n\rv]*[\"']*([^\"']*)[\"']*",$tag,$regs)) {
// Some other checks and then store to $url array if it passes
} else {
$urls[] = "";
}
}
$pos = $lastpos+1;
}
for ($i=0; $i<sizeof($urls); $i++) {
//store URLS to database
}