Hey All,

I have a record on my database which holds the character Á , when trying to read this out as XML like so:

<?xml version="1.0" encoding="utf-8"?>
<response>
<description><![CDATA[Á - <b>special character test</b>]]></description>
</response>

I get the error:
XML Parsing Error: not well-formed
Which refers to the Á character.

I can put in this before the xml is created:

$value = htmlentities($value);

Which fixes the 'not well formed' error, but also converts the <b> tags which I want to keep in place.

Is there a way to only convert the Á character (or any other non english characters) but keep the html tags in place?

Cheers

    Your XML declares a character set of UTF-8. This is a multi-byte charset which supports pretty much every language in the world.

    I'm on thin ice here, but I'll try and explain. Hopefully weedpacket or bradgrafelman or nogdog or someone might chime in. Unfortunately, PHP does not by default use multibyte character sets when dealing with character strings and -- perhaps even more confusingly -- PHP may report a different value in this script depending on how you save the PHP file.

    <?
    $str = 'ÁÁ';
    echo strlen($str);
    ?>

    If you save the file as regular ASCII/ANSI/Latin-1 text, it will probably report "2". If you save it as UTF-8, it will report "4".

    Instead of htmlentities, try using [man]utf8_encode[/man].

    If that doesn't work, post back here and we'll try to figure it out.

      Write a Reply...