I need to parse XML files which contain a lot of 8859-1 extended characters (basically names in Irish Gaelic).
When I run this through the parser I currently use for similar processing (new simpleXMLElement($xmlstr) you know the one), the encodings are not working:
If I have an item:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<all>
<item>
<type>Memorial</yype>
<name>SURNAME Éadaoín</name>
<text>In memory of Éadaoín.
Remembered by Ronán, Isibéal, Orla, Muireann and Dáire.
</text>
<date>2008-09-20</date>
</item>
</all>
Once it comes through the simpleXMLElement() it is converted to:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<all>
<item>
<type>Memorial</type>
<name>SURNAME ÉadaoÃ*n</name>
<text>In memory of ÉadaoÃ*n.
Remembered by Ronán, Isibéal, Orla, Muireann and Dáire.
</text>
<date>2008-09-20</date>
</item>
</all>
I've tried to convert the characters before it gets put through the function:
$rawxml = file_get_contents($file);
$xmlstr = ($chars, $replace, $rawxml);
$xml = simpleXMLElement($xmlstr);
and the correct translations are there.
But when they are then processed it seems to de-encode them and then make the same errors.
Where am I going wrong???? Anyone any ideas?
For info, this will be parsing approximately 150 records a week where various acute and grave characters could appear in the text at any time.