Hi All,
I am attempting to parse a collection of XML files, some of which are not well formed. The data I wish to extract with my parser is always well formed, and I'm not interested in the stuff in the section containing the errors, so I would like to simply chop it out and only read the well-formed XML into my parser.
I can do these two things separately, but I can't seem to figure out how to combine them in a working script.
This is the guts end of my working parser which currently reads the remote file and then uses the file pointer:
function run_parser($xml_file){
//========================================
// Run the main script
//========================================
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "startElement", "endElement");
xml_set_character_data_handler($xml_parser, "characterData");
$fp = @fopen("$xml_file","r")
or die("Sorry, the connection to the mapping data server has failed. Please try again later on.");
// Remove foreign characters as they invalidate xml
while ($data = fread($fp, 4096))
xml_parse($xml_parser, $data, feof($fp))
or die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
fclose($fp);
xml_parser_free($xml_parser);
}
This is the test script I wrote to explode the XML at end of the dodgy first part, which is quite often not well-formed.
$xml_file = "http://www.example.com/file.xml";
// Read XML file
$fp = @fopen("$xml_file","r")
or die("Sorry, the connection to the mapping data server has failed. Please try again later on.");
while($line = @fgets($fp, 1024)){
$f_contents .=$line;
}
$chunks = explode("</dodgydata>", $f_contents);
print("<b>Contents of explode at:</b><br/><textarea cols='100' rows='50'>$chunks[1]</textarea>");
fclose($fp);
The stuff I want to put into xml_parse is stored in $chunks[1].
My question is, how do I get the data into the parser, please?
Many thanks for your help.