Thanks very much dalecosp for putting my mind at ease a little bit 🙂
Here's how I've implemented it.
First, this function is run on the URL. I do this right at the beginning and use the resulting array for various things in the site, one of which is the bad GET data:
function dissectURL($url){
$scheme = parse_url($url, PHP_URL_SCHEME);
$host = strtolower(parse_url($url, PHP_URL_HOST));
if(substr($host, 0, 4) == 'www.'){
$host = ltrim($host,"www.");
}
// $user = parse_url($url, PHP_URL_USER);
// $pass = parse_url($url, PHP_URL_PASS);
$path = parse_url($url, PHP_URL_PATH);
$path = ltrim($path,"/");
$query = parse_url($url, PHP_URL_QUERY);
$fragment = parse_url($url, PHP_URL_FRAGMENT);
$returnURL['scheme'] = $scheme;
$returnURL['host'] = $host;
$returnURL['path'] = $path;
$returnURL['query'] = $query;
$returnURL['fragment'] = $fragment;
return $returnURL;
}
Then this is what I run to check for the % character:
if (strpos($testquery, '%') == TRUE OR strpos($testquery, '%') == TRUE) {
/* We've got a match on the percentage sign in query. Add them to the autoban list and direct them to the bad boy page. */
--DB INSERT HERE--
include("./includes/autobanned.html");
exit;
}
The site doesn't see much traffic yet, so no real world matches or false positives, but I'll keep a close eye on it to make sure I've not screwed something up.