I am using the PHP SAX method to parse a relatively large XML document (18 Kš that is stored on disk. I'm finding that the character data extracted varies depending on the structure of the XML document.
If I include multiple tabs or whitespace, the character data of long strings is being truncated from the left side. For example a file structured as follows:
<book>
<author>This is the name of the author.</author>
</book>
would provide the following character data:
"ame of the author."
The amount of letters truncated appears to be related to the amount of whitespace in the XML document.
If I remove all whitespace so all tags are on one line, the process works like a charm.
Has anyone seen this type of behavior before?