We have a fully-UTF-8 system, and a client has asked for output with certain characters as their NCR equivalent.

e.g. © as & #169;

Is there a way to do this in PHP, either on a string or character basis?

Thanks all.

    What does "certain characters" mean? Which ones?

    Example, this code:

    $string = 'Hello world, © 2008!';
    $string = preg_replace('/([^a-z0-9 ])/ie', '"&#" . ord("$1") . ";"', $string);
    
    echo $string;

    outputs:

    Hello world, © 2008!

      Thanks for that. They've given us a list of 201 or so characters, £ sign, alpha, beta and so on.

      I got as far as this for testing:

      for ($i=0; $i<strlen($string); $i++) {
      	$output .= "&#" . ord($string[$i]) . ";";
      }

      But both that and yours give mangled output in the browser, like an input string of "£ xxx © ¤ &#931; &#945; ß" is displayed as "£ xxx © ¤ Σ α ß".

      E.g. the pound sign has become & #194;& #163;
      instead of the expected & #163;
      (gaps deliberate)

      Is there something about these particular characters that ord() doesn't like maybe.

        Just found a statement that ord breaks on unicode, so off to do more research. Thanks again for the help.

        edit: Found this package, which seems to do the job well: http://hsivonen.iki.fi/php-utf8

        So we're on the way to a full solution.

          Well, that's because the file is UTF-8, which means that £ is stored as two bytes: "0xc2 0xa3" (ord only works on a byte-by-byte basis) which just happens to look like £ when interpreted as Latin-1.
          If the table is fixed you could just write it up by hand

          $mapping = array(
          "£" => "&#38;#163;",
          "©" => "&#38;#169;",
          "&#931;" => "&Sigma;", // I'm guessing here: what does "NCR" mean?
          ...
          );
          
          $output = str_replace(array_keys($mapping), array_values($mapping), $input);
          

          Provided you use UTF-8 for that file, the keys will be the right byte sequences.

            Thanks, looks good. NCR = Numerical Character Reference.

              Write a Reply...