I know that preg_replace is more advanced as it supports regex matchin.....
but i downloaded this user input sanitation rourtines from http://www.owasp.org, and there's one routine that intrigued me..
the original routine was:
// sanitize a string for HTML (make sure nothing gets interpretted!)
function sanitize_html_string($string)
{
$pattern[0] = '/\&/';
$pattern[1] = '/</';
$pattern[2] = "/>/";
$pattern[3] = '/\n/';
$pattern[4] = '/"/';
$pattern[5] = "/'/";
$pattern[6] = "/%/";
$pattern[7] = '/\(/';
$pattern[8] = '/\)/';
$pattern[9] = '/\+/';
$pattern[10] = '/-/';
$replacement[0] = '&';
$replacement[1] = '<';
$replacement[2] = '>';
$replacement[3] = '<br>';
$replacement[4] = '"';
$replacement[4] = '"';
$replacement[5] = '& #39;';
$replacement[6] = '& #37;';
$replacement[7] = '& #40;';
$replacement[8] = '& #41;';
$replacement[9] = '& #43;';
$replacement[10] = '& #45;';
return preg_replace($pattern, $replacement, $string);
}
(note that i have added a SPACE in between '& #39;' so that vBulletin doesnt convert it )
here's my routine:
function sanitize_html_string($string)
{
$pattern[0] = '&';
$pattern[1] = '<';
$pattern[2] = ">";
$pattern[3] = "\n";
$pattern[4] = '"';
$pattern[5] = "'";
$pattern[6] = '%';
$pattern[7] = '(';
$pattern[8] = ')';
$pattern[9] = '+';
$pattern[10] = '-';
$replacement[0] = '&';
$replacement[1] = '<';
$replacement[2] = '>';
$replacement[3] = '<br>';
$replacement[4] = '"';
$replacement[4] = '"';
$replacement[5] = '& #39;';
$replacement[6] = '& #37;';
$replacement[7] = '& #40;';
$replacement[8] = '& #41;';
$replacement[9] = '& #43;';
$replacement[10] = '& #45;';
return str_replace($pattern, $replacement, $string);
}
i ran my string (with the target illegal characters ofcourse) through each functions and the results were an exact match..
So makes me wonder why the good developer used preg_replace instead of str_replace, which "i feel" is faster/more suitable than preg_replace which is a more advanced function? plus i dont have to mess with the slashing of each target character as one would do when using preg_replace...
any takes on this?
on a side note.. is it really necessary to sanitize html string this way? whatever happened to good old urlencode or htmlspecialchars? makes me wonder why the guy had to create such routines..
just me and my wondering mind..