I know that preg_replace is more advanced as it supports regex matchin.....

but i downloaded this user input sanitation rourtines from http://www.owasp.org, and there's one routine that intrigued me..

the original routine was:

// sanitize a string for HTML (make sure nothing gets interpretted!)
function sanitize_html_string($string)
{
  $pattern[0] = '/\&/';
  $pattern[1] = '/</';
  $pattern[2] = "/>/";
  $pattern[3] = '/\n/';
  $pattern[4] = '/"/';
  $pattern[5] = "/'/";
  $pattern[6] = "/%/";
  $pattern[7] = '/\(/';
  $pattern[8] = '/\)/';
  $pattern[9] = '/\+/';
  $pattern[10] = '/-/';
  $replacement[0] = '&amp;';
  $replacement[1] = '&lt;';
  $replacement[2] = '&gt;';
  $replacement[3] = '<br>';
  $replacement[4] = '&quot;';
  $replacement[4] = '&quot;';
  $replacement[5] = '& #39;';
  $replacement[6] = '& #37;';
  $replacement[7] = '& #40;';
  $replacement[8] = '& #41;';
  $replacement[9] = '& #43;';
  $replacement[10] = '& #45;';

  return preg_replace($pattern, $replacement, $string);
}

(note that i have added a SPACE in between '& #39;' so that vBulletin doesnt convert it )

here's my routine:

function sanitize_html_string($string)
{
  $pattern[0] = '&';
  $pattern[1] = '<';
  $pattern[2] = ">";
  $pattern[3] = "\n";
  $pattern[4] = '"';
  $pattern[5] = "'";
  $pattern[6] = '%';
  $pattern[7] = '(';
  $pattern[8] = ')';
  $pattern[9] = '+';
  $pattern[10] = '-';
  $replacement[0] = '&amp;';
  $replacement[1] = '&lt;';
  $replacement[2] = '&gt;';
  $replacement[3] = '<br>';
  $replacement[4] = '&quot;';
   $replacement[4] = '&quot;';
  $replacement[5] = '& #39;';
  $replacement[6] = '& #37;';
  $replacement[7] = '& #40;';
  $replacement[8] = '& #41;';
  $replacement[9] = '& #43;';
  $replacement[10] = '& #45;';

  return str_replace($pattern, $replacement, $string);

}

i ran my string (with the target illegal characters ofcourse) through each functions and the results were an exact match..

So makes me wonder why the good developer used preg_replace instead of str_replace, which "i feel" is faster/more suitable than preg_replace which is a more advanced function? plus i dont have to mess with the slashing of each target character as one would do when using preg_replace...

any takes on this?

on a side note.. is it really necessary to sanitize html string this way? whatever happened to good old urlencode or htmlspecialchars? makes me wonder why the guy had to create such routines..

just me and my wondering mind..

    In some ways this is a matter of choice. Some PHP programmers started in environments where they have very strong regular expression skills (for example, from PERL). Regular expressions allow for vast flexibility.

    In the code you show, there is no particular coding advantage over using str_replace, and from your tests it appears that there is no efficiency advantage either.

    It takes me forever to write regular expressions, so I always use str_replace instead if I can. For simple search/replace, it sure does the trick.

    As for your "how come he wrote this?" questions, either:
    1. PHP functionality didn't exist in a prepackaged function at the time he wrote the code.
    2. PHP functionality existed, but he didn't know.
    3. He knew but felt like writing it his way anyway.

    There are many ways to skin cats these days.

      i see

      As for your "how come he wrote this?" questions, either:
      1. PHP functionality didn't exist in a prepackaged function at the time he wrote the code.
      2. PHP functionality existed, but he didn't know.
      3. He knew but felt like writing it his way anyway.

      or he just loves regular expressions as he is an expert at it already. lol

      but im with you nemonoman.. str_replace is fine and QUICK. 🙂

      thanks

        Write a Reply...