I'm trying to process a simple XML file that looks like this:
<?xml version="1.0" ?>
<REVIEWS>
..other nodes...
<REVIEW ID="11">
<ARTIST>BENÜMB</ARTIST>
<TITLE>SOUL OF THE MARTYR</TITLE>
<LABEL>Relapse</LABEL>
<RATING>6</RATING>
<TEXT>some tezt</TEXT>
</REVIEW>
...more nodes follow...
</REVIEWS>
But as soon as I try to load it in so I can query it via xpath, I get an error on the line that has the "Ü" in it:
$fd = fopen ("reviews/$file", "rb");
$contents = fread ($fd, filesize ("reviews/$file"));
fclose ($fd);
$dom = xmldoc($contents));
xmldoc() seems to fail with errors like:
"Entity: line 74" (line 74 is the one with the "Ü" in "BENÜMB")
"Input is not proper UTF-8, indicate encoding !"
etc..etc...
I read that PHP treats XML as UTF-8 as default, shouldn't it be able to read in such German characters by default then without any errors?
I tried encoding="ISO-8859-1", and that avoids errors...but those characters are then mangled with retrieving data via xpath using $node->get_content()...not good either.
Is this a PHP bug or an issue with the way the file was created?
I'm on PHP 4.2.3, Windows 2000, Apache.
Thanks in advance for any help...