Hi all,
My apologies if my DOM/XML vocabulary is a bit off, but this is my first attempt....
I'm trying to convert a 50MB XML file (containing 35000 <$prod_tag> tags into a CSV file (each line representing the tags within <$prod_tag> </$prod_tag>.
The script parses 100 <$prod_tag>'s at a time, saving the data in an array, then dumps it to file. The array is then reset to save memory.
However, the script starts of parsing 100 tags in ~1 sec but as time goes by, it gets slower and slower, parsing 100 tags per 5secs and then the script timesout.
Firstly, why does tree traversal slow down? I've echoed memory_get_usage() every time the script dumps data to file and memory management is good (maxes at ~1.1MB on the 100th tag).
Is it all the writing to disk? or is it the way DOM trees work? Can I not unset DOM nodes as they've been parsed?
I'm lost! Anyway - here's my naked parsing script....
$doc = new DOMDocument();
$doc->load($feed_filename);
$items = $doc->getElementsByTagName($prod_tag);
foreach($items as &$item) {
foreach($column_names as $column) { //load up the data for each column tag
$allnodes = $item->getElementsByTagName($column);
$i=0;
foreach($allnodes as $node) { //there can be >1 instance of the column tag (esp. categories path)
if($i==0) $data[$column]= $allnodes->item(0)->nodeValue;
else $data[$column] .= "##".$allnodes->item($i)->nodeValue; //yup, there were >1 instances. Delimit with ##
$i++;
}
}
$numItems ++;
$k=0;
foreach($data as $val) {
if( ($numItems>1) && ($k===0) ) $todump .= "\n";
if($k===0) $todump .= $val;
else $todump .= "\t".$val;
$k++;
}
unset($data);
unset($allnodes);
unset($item);
if($numItems %100 ==0) {
print"<code>_</code>"; //.memory_get_usage() ;
if(fwrite($handle, $todump)===FALSE) return FALSE;
unset($todump);
}
if($numItems %10000 ==0) print("<br>");
flush();
}
fclose($handle);