Hi all,

Well, I think that subject is very clear:

I am trying to program a script to determine if part of HTTP_USER_AGENT matches with a list of robots keywords.

I have tried to put the list in an array and use foreach or for with ereg, stripos, stristr, preg_match and several options without success.

This list could be added to an array or Mysql database, no problem.

$bots_list = Array("robot",
"unchaos",
"unchaosbot",
"robots.txt validator",
"die-kraehe meta-search-engine",
"sitidi.net",
"sitidibot"
);

I need some idea to develop something that runs fine matching complete words with blank spaces and so on.

It isn't easy. I have worked several hours, but I cannot find a good solution.

Thank you very much in advance.

Mapg

    [man]strpos[/man] would do that, or stripos (which is just the same only case-independent). It doesn't care whether a character is a space or not, it will match spaces with spaces as easily as it matches q's with q's.

      A kind guy post this script in other forum. I tried this posibility before but as you can see, i.e: the robot "unchaos" is determined as robot, that's ok, but also "unchaosfun", which is not in the array. That's the problem.

      The main issue is that I am not sure if, in a real scenario, a script that matches keywords no present in the array could produce many false positives.

      stristr doesn't provide a strict search in the delimeted elements of array.

      I could use this option but before this, I would like to research/learn some alternative.

      <?php 
      $bots_list = Array('robot','unchaos','unchaosbot','robots.txt validator','die-kraehe meta-search-engine','sitidi.net','sitidibot'); 
      
      for($i=0; $i < count($bots_list); $i++) { 
          if(stristr($_SERVER[HTTP_USER_AGENT], $bots_list[$i]) !== FALSE) { 
              $is_bot = TRUE; 
              break; 
          } 
      } 
      
      echo isset($is_bot) ? 'User agent is a bot' : 'User agent is not a bot'; 
      ?>

      Any idea is welcome.

      Mapg

        If you want a strict search then $_SERVER['HTTP_USER_AGENT'] == $bots_list[$i] could be done, but that's probably not what you want (what about version numbers that some UA strings come with?)

        preg_match() could do it; /\bunchaos\b/i would match "unchaos" and "unchaos v1.4" and "the crawler formerly known as unchaos" but not "unchaosfun" or "funchaos" (\b means "word boundary", and the i on the end is to make the match case unsensitive.).

          Thank you for your suggestion.

          Well, I am using now the stristr option and taking a look at an specific custom error_log which writes $bots_list[$i] everytime a bot is identified by the function.

          By this way I am debugging the database and checking if the function works usefully or else I will use the preg_match as you suggested gladly.

          Best regards and thank you again.

          Mapg

            Write a Reply...