I have a simple script that gets visitor environment and stores it in to a db.

I am trying to skip all visits by bots to my site.

I added this code to accomplish it:

function is_bot(){
	$bots = array("alexa", "appie", "Ask Jeeves", "Baiduspider", "bingbot", "Butterfly", "crawler", "facebookexternalhit", "FAST", "Feedfetcher-Google", "Firefly", "froogle", "Gigabot", "girafabot", "Googlebot", "InfoSeek", "inktomi", "looksmart", "Me.dium", "Mediapartners-Google", "msnbot", "NationalDirectory", "rabaz", "Rankivabot", "Scooter", "Slurp", "Sogou web spider", "Spade", "TechnoratiSnoop", "TECNOSEEK", "Teoma", "TweetmemeBot", "Twiceler", "Twitturls", "URL_Spider_SQL", "WebAlta Crawler", "WebBug", "WebFindBot", "www.galaxy.com", "ZyBorg");

foreach($bots as $bot){
	if(strpos($_SERVER['HTTP_USER_AGENT'],$bot)!==false)
	return 1;	// Is a bot
}
return 0;	// Not a bot
}

if (!is_bot()) {

// store in db

}

For some reason I keep seeing results like this in my db table:

Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Am I missing anything?

    zzz;10969150 wrote:

    Am I missing anything?

    Yeah; a robots.txt file in the root of your site. 🙂

      I have robots.txt with directives but I don't see what it has to do with the PHP code that does not reference that file...

      I don't want to block robots, I just don't want to record their visits.

        this works for me small tweak as not useing global

        <?php
        
        $test = "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) ";
        
        function is_bot($test){
        	$bots = array("alexa","appie","Ask Jeeves","Baiduspider","bingbot","Butterfly","crawler","facebookexternalhit","FAST","Feedfetcher-Google","Firefly","froogle","Gigabot","girafabot","Googlebot","InfoSeek","inktomi","looksmart","Me.dium","Mediapartners-Google","msnbot","NationalDirectory","rabaz","Rankivabot","Scooter","Slurp","Sogou web spider","Spade","TechnoratiSnoop","TECNOSEEK","Teoma","TweetmemeBot","Twiceler","Twitturls","URL_Spider_SQL","WebAlta Crawler","WebBug","WebFindBot","www.galaxy.com","ZyBorg");
        
        foreach($bots as $bot){
        	if(strpos($test,$bot)!==false)
        		return 1; // Is a bot
        }
        return 0; // Not a bot
        }
        
        if(!is_bot($test)){
        	echo 'NOT BOT';
        }else{
        	echo 'IS BOT';
        }
        ?>
        

        output= IS BOT

          I would most likely still go with a robots.txt solution.

          robots.txt

          User-agent: *
          Disallow: /reg_ua.php
          

          Then reg_ua.php can now log UA every time it is requested, since bots will no longer access it. In the end it displays the string for a 1px transparent gif

          storeUserAgent();
          header('content-type: image/gif');
          echo "GIF89a ....;";  # part after 89a was stripped by boards filter
          

          And you can easily control how often you want to log each user's user agent

          echo '<img src="/reg_ua.php?t='. date('Ymd') .'" />';	# once per day
          echo '<img src="/reg_ua.php?t='. date('YmdH') .'" />';	# once per hour
          echo '<img src="/reg_ua.php?t='. time() .'" />';		# every time
          

          This approach should disregard all robots, wether you know their names or not. On the other hand, it will only register user agent's that actually fetch replaced content, which for example a cUrl request would not do (automatically).

            As dagon suggests, it seems like a scope issue. Try:

            function is_bot($ua_string) {
                $bots = array("alexa", "appie", "Ask Jeeves", "Baiduspider", "bingbot", "Butterfly", "crawler", "facebookexternalhit", "FAST", "Feedfetcher-Google", "Firefly", "froogle", "Gigabot", "girafabot", "Googlebot", "InfoSeek", "inktomi", "looksmart", "Me.dium", "Mediapartners-Google", "msnbot", "NationalDirectory", "rabaz", "Rankivabot", "Scooter", "Slurp", "Sogou web spider", "Spade", "TechnoratiSnoop", "TECNOSEEK", "Teoma", "TweetmemeBot", "Twiceler", "Twitturls", "URL_Spider_SQL", "WebAlta Crawler", "WebBug", "WebFindBot", "www.galaxy.com", "ZyBorg");
            
            foreach($bots as $bot){
                if(strpos($ua_string,$bot)!==false)
                return 1;    // Is a bot
            }
            return 0;    // Not a bot
            }
            
            if (!is_bot($_SERVER['HTTP_USER_AGENT'])) {
            
            // store in db
            
            } 

              And you might need to quote it:

              dalecosp;10969246 wrote:
                  foreach($bots as $bot){
                      if(strpos("$ua_string",$bot)!==false)
                      return 1;    // Is a bot
                  }
                  return 0;    // Not a bot

              }

                Write a Reply...