Spammers have started filling out the forms on my site. About 1/2 the form I can reduce to check boxes and radio buttons. However, for name, address, comments and similar fields, spammers are putting in their vile messages.

$ghost = $_POST['comments'];

A typical input ($ghost) might be something like:

"I'm interested in such and such. Can you please send me info about blah."

In most cases, the spammers will put a URL (or series of URLs) in the comments field. It looks something like:

"Come visit my site www.vileimagery.com for the best vile imagery."

How do I check the input ($ghost) for a URL in the midst of all the other text.

    if(preg_match('/<a.*?href="[^"]*"[^>]*>.*?<\/a>/i', $ghost))
    {
        // Hmm... seems there's a URL
    }
    else
    {
        // Nope, no URL found via a link
    }

    Now,that just looks for links like HTML links. It doesn't look for just urls.

    The best way would probably be to come up with an array or list of words that are typically found in SPAM. Use that list to see if any of those words are in the $ghost variable. If they are, either ignore it and say "Your message was sent" even though it wasn't, or put it in a queue where you physically have to read it and then allow it to be sent.

      I notice a lot of the Spam contains BCC code in an effort to confused my mail handler, I explode the string on BCC and count that. It actually removed a lot of my spam (90%).

      I count for URLs, WWW, email from my domain, BCC and a few items, and flush. Check out you email headers for spam scores to get other ideas:

      Here is a quick snip:

      $domain = strstr($from_who, '@');

      $text = $email_address2.$email_subject.$Collected_Info.$email_headers.$from_who;
      $count_bcc = count(explode("bcc", $text));
      $count_ats = count(explode("@", $text));
      $count_ats_domain = count(explode("@domain", $text));

      if($domain!="@domain.com" && $count_ats <="10" && $count_bcc <="1" && $count_ats_thedomain<="3"){
      mail($email_address2, $email_subject, $Collected_Info, $email_headers);
      mail($email_address, $email_subject, $Collected_Info, $email_headers);
      };

        Hi. I personally don't like using 'Captcha' as I find it puts some people off filling in forms. It depends just how much you need the person to fill in the form. For example, if you want to sell them something then its not a good thing. We just look for a URL and delete anything that comes via this route.

        I've been wondering about testing a checkbox called "Only click this if you are NOT human" as these robots seem to want to select everything 🙂 Not sure how well this would work

          The checkbox wouldn't work. Hackers would look at the label or text preceeding or after checkbox to see if it says something like "Not human" so they can get the form to work. Hence why a captcha works...

          And it's not a huge inconvenience. If your users really want security, they'll accept the fact that you're taking a step in the right direction for them. If not, then they'll get screwed sometime in the future either at your site, or somewhere else because they're sacraficing security for convenience.

            Spammers are unlikely to write a robot for your form specifically. Most of these bots are really stupid.

            Therefore, a "Check this box if you are human" etc, WOULD work.

            Or something really simple, like having them type some text from the page into a field.

            Spammers don't write robots themselves, they use shoddy off-the-shelf ones. They're totally hopeless and are designed only to work with a small number of really bad forms. They will POST anything to just about anything.

            There are no humans controlling these; in most cases no human ever even checks if the robot is successful. They simply don't care.

            Conclusion: Don't use a CAPTCHA unless you're making the next Google mail. Spammers won't target your site specifically unless you ARE Google.

            Mark

              To protect against spam bots the first thing is to check the http referer in the form processor

              // check for wrong referer - assume spam attack and abort
              	if (($_SERVER['HTTP_REFERER'] != 'http://yourdomain.com/contact.php') and ($_SERVER['HTTP_REFERER'] != 'http://www.yourdomain.com/contact.php' )) {
              		echo $_SERVER['HTTP_REFERER'];
              		echo 'SPAM ATTACK!';
              		die;
              	}
              

              Then you need to check for additional mail form exploits in the user email address input

              // Remove any newlines from input email address to protect against mail form exploits by spammers
              function has_no_newlines($text)
              {       
              return preg_match("/(%0A|%0D|\\n+|\\r+)/i", $text) == 0; }

              Finally you need to look for email headers in the textarea input

              // Remove any email headers from mail form input to protect against mail form exploits by spammers
              function has_no_emailheaders($text)
              {       
              return preg_match("/(content-type:|to:|cc:|bcc:)/i", $text) == 0; }

              Hope this helps, works for me.

              Unfortunately you cannot stop them sending you crap - most of my blog coments are like that - but you can at least stop them from exploiting your form to spam other people.

                Except that the referer is a vulnerable exploit, and thus can't be trusted.

                What you might look into is bbprotect... they can help you limit spam posts... they have an open API now.

                  bpat1434 wrote:

                  Except that the referer is a vulnerable exploit, and thus can't be trusted.

                  Quite true, that is why it is only the first step, but a worthwhile step for all that.

                    And considering that sending the "Referer" header isn't required in either 1.0 or 1.1 HTTP protocols, in addition to the fact that some proxies/gateways/browsers/plugins/etc. block or alter this header... I'm not so sure I would want to use it.

                      Mark, thanks for your confirmation. Going to try this on our sites. Something like:

                      Note: This area is to prevent Spam. Please ignore it

                      [ ] Check this checkbox only if you are not human!

                      I'll see what happens as we do get quite a few on our site.

                      Best

                      Ade

                        A checkbox doesn't provide sufficient possibilities to block robots. They could simply ignore it, and still get through.

                        Referrer checking will block legitimate users with browsers which don't send referrers (or firewalls which delete them).

                        Personally I'd just put in a field with some constant string required:

                        Please enter the word "monkey" into the following field:

                        <input type="text" name="check" value="">

                        Or something like that.

                        The robot won't know what it needs to enter into which field, therefore it will probably get it wrong.

                        Yes, it would be trivial to engineer a robot to pass this check, but I doubt very much whether spammers would do so, unless you are Google.

                        Mark

                          Or you could just as easily implement a graphical captcha. There are plenty of open source captcha scripts out there.

                            Well, I never knew that firewalls stripped out the referrer. 😕

                            Could not care less about odd-ball browsers that make up .025% of the users but firewalls are another matter, loads of them around. Guess I'll have to change my strategy :mad:

                              It is not the case that every firewall product strips referrer by default; but some do.

                              Some are proxy style products which are installed by companies - others are "personal firewall" products which either filter stuff at the IP layer or install browser plugins which modify the outgoing headers.

                              If the referrer is stripped you will often see funny looking headers like XXXXX:--------

                              This is what apparently quite a lot of "Personal firewall" products do, some by default.

                              Mark

                                Write a Reply...