Here's one I made earlier, for defence against particular unpleasant bots sending crud which causes my application to fail in unusual ways.
Hopefully it's fairly self-explanatory - feel free to comment on anything which isn't clear:
<?php
/*
* Validate a request against possible nasties.
*/
/*
* Scans all GET, POST and COOKIES, for the following:
*
* ASCII control characters (< 0x20) except \r, \n and tab
*
* GET and COOKIES are not allowed to contain newlines either.
*
*
*/
function ValidateIncomingRequest()
{
foreach (array_keys($_GET) as $key) {
_CheckNoControlCharacters($_GET[$key], $key, "GET field", false);
}
foreach (array_keys($_POST) as $key) {
_CheckNoControlCharacters($_POST[$key], $key, "POST field", true);
}
foreach (array_keys($_COOKIE) as $key) {
_CheckNoControlCharacters($_COOKIE[$key], $key, "Cookie", false);
}
_CheckNoControlCharacters($_SERVER['PATH_INFO'], 'path info', 'path info', false);
}
function _CheckNoControlCharacters(& $value, $key, $typename, $allow_newlines)
{
// Check that the key doesn't contain any invalid chars.
// Keys must only contain printable ascii.
if (! preg_match('/^[\\x20-\\x7e]+$/', $key)) {
_FailRequestValidation("Invalid key in $typename", $key);
}
// Value is an array, check each item separately.
if (is_array($value)) {
for ($i=0; $i<count($value); $i++) {
_CheckRequestValue($value[$i], $key . '[' . $i . ']', $typename, $allow_newlines);
return;
}
} else {
_CheckRequestValue($value, $key, $typename, $allow_newlines);
}
}
function _CheckRequestValue(& $value, $key, $typename, $allow_newlines)
{
// Check that the value doesn't contain invalid chars.
// It may contain any printable character in an 8bit charset
// i.e. \x80 - \xff as well.
// and \t (tab)
// \r (CR) and \n (LF) only if newlines are allowed.
if ($allow_newlines) {
$re = '/^[\\x20-\\x7e\\x80-\\xff\\t\\r\\n]*$/';
} else {
$re = '/^[\\x20-\\x7e\\x80-\\xff\\t]*$/';
}
if (! preg_match($re, $value)) {
_FailRequestValidation("Invalid value for $typename $key", $value);
}
}
function _FailRequestValidation($error, $value)
{
$value_hex = bin2hex($value);
$ip = $_SERVER['REMOTE_ADDR'];
error_log("Request validation failed for client $ip: $error");
error_log("Failed value (hex):" . $value_hex);
header("HTTP/1.0 400 Bad Request");
header("Content-type:text/plain");
echo "Invalid request.";
exit;
}
?>
This does make an assumption that you're never going to submit a form with newline-containing fields with a GET method, however, in my applications this assumption is safe (Hint: a textarea in a GET form is a bad idea)
Mark