header('Content-Type: text/xml, charset=ISO-8859-1');	

echo "<?xml version=\"1.0\" encoding=\"ISO-8859-1\" ?>\n";

Due to I grab the data from 3rd party database which is latin1 (iso-8859-1), my first thought is set my xml character set as iso-8859-1.

But the data I get from latin1 database, they still have utf8 characters. and that doesn't display well. And I was trying to use utf8_decode on the value. It only transfer some unicode back, some transferred to a question mark "?".

Anolther approach, even the database is on latin1, I would still use utf8 character set for xml.

header('Content-Type: text/xml, charset=UTF-8');	

echo "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n";

And I would have to use utf8_encode on the values. Plus, I have to detect if the value is already in utf8 (even it is from latin1 database). If it is, then don't utf8_encode it again.

Would that be the right approach? What is your experience in the similar situation?

And how did you deal with the other issues in this situaion such as htmlentities-ed value in the database, or windows symbols etc.?

    Write a Reply...