i've searched this forum for script bits, but i was wondering if anyone has a reall good web crawler/bot detector script? i want to know if my pages are dealing with a search engine or a real user. this is the best i've found so far:
<?
$user_agent = $_SERVER['HTTP_USER_AGENT'];
function is_bot($user_agent) {
$spiders = array('Googlebot', 'MSNBOT', 'FAST-WebCrawler', 'Gigabot', 'YahooSeeker', 'ZyBorg');
foreach($spiders as $key=>$bot) {
echo 'testing:' . $bot . '<br>';
if (stristr($user_agent, $bot)) {
return TRUE;
}
}
// if we reach this point, we've tried all the bots with no match
return FALSE;
}
echo 'result:' . is_bot($user_agent);
?>
this seems to work pretty well as far as i can tell--at least logic-wise. HOWEVER, my list of bots is pretty lame. Is there somewhere i can find a fairly comprehensive list of webcrawlers/bots? I've seen stuff like:
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
in my access logs and i have no idea if that's a bot or someone with a yahoo search toolbar.