We have a fully-UTF-8 system, and a client has asked for output with certain characters as their NCR equivalent.
e.g. © as & #169;
Is there a way to do this in PHP, either on a string or character basis?
Thanks all.
We have a fully-UTF-8 system, and a client has asked for output with certain characters as their NCR equivalent.
e.g. © as & #169;
Is there a way to do this in PHP, either on a string or character basis?
Thanks all.
What does "certain characters" mean? Which ones?
Example, this code:
$string = 'Hello world, © 2008!';
$string = preg_replace('/([^a-z0-9 ])/ie', '"&#" . ord("$1") . ";"', $string);
echo $string;
outputs:
Hello world, © 2008!
Thanks for that. They've given us a list of 201 or so characters, £ sign, alpha, beta and so on.
I got as far as this for testing:
for ($i=0; $i<strlen($string); $i++) {
$output .= "&#" . ord($string[$i]) . ";";
}
But both that and yours give mangled output in the browser, like an input string of "£ xxx © ¤ Σ α ß" is displayed as "£ xxx © ¤ Σ α ß".
E.g. the pound sign has become & #194;& #163;
instead of the expected & #163;
(gaps deliberate)
Is there something about these particular characters that ord() doesn't like maybe.
Just found a statement that ord breaks on unicode, so off to do more research. Thanks again for the help.
edit: Found this package, which seems to do the job well: http://hsivonen.iki.fi/php-utf8
So we're on the way to a full solution.
Well, that's because the file is UTF-8, which means that £ is stored as two bytes: "0xc2 0xa3" (ord only works on a byte-by-byte basis) which just happens to look like £ when interpreted as Latin-1.
If the table is fixed you could just write it up by hand
$mapping = array(
"£" => "&#163;",
"©" => "&#169;",
"Σ" => "Σ", // I'm guessing here: what does "NCR" mean?
...
);
$output = str_replace(array_keys($mapping), array_values($mapping), $input);
Provided you use UTF-8 for that file, the keys will be the right byte sequences.
Thanks, looks good. NCR = Numerical Character Reference.