Hi,
I was trying to write a script that parses RSS feeds from different websites, using PHP SAX parser.
Everytihng seems to b working OK except when parsing the <description> tag of the feed, the parser calls the 'characterData' callback function before it actually finishes the whole <description> tag.
For example:
Imagine this part in the php.net RSS feed:
<item rdf:about="http://weblabor.hu/php-doc-chm">
<title>PHP Manual CHM Edition - 12th build</title>
<link>http://weblabor.hu/php-doc-chm</link>
<description>The 12th build of the extended CHM edition is out now, and available for download. This build contains updated content and user notes, as well as fixes the bugs found in the previous build. A new optional "phpZ" skin is also introduced in this release, courtesy of Gonzalo De la Pena Andreu.</description>
<dc:date>2003-09-06</dc:date>
</item>
Now when the parser calls the data callback function, the whole data enclosed in the <description> tag is not passed. Instead of one call to the function, several is made with broken text.
If anybody has the slightest idea what im talkin abt, please HELP me:
this is my 'characterData' call back function:
<?php
function characterData($parser, $data)
{
global $currentTag, $flag, $smarty;
// if within an item block, print item data
switch ($currentTag) {
// Stripped down 4 simplicity
case 'DESCRIPTION':
echo '['.$data."]\n<br>\n";
break;
}
}
?>
and heres the output:
[The 12th build of the extended CHM edition is out now, and available for download. This build contains updated content and user notes, as well as fixes the bugs found in the previous build. A new optional ]
<br>
["]
<br>
[phpZ]
<br>
["]
<br>
[ skin is also introduced in this release, courtesy of Gonzalo De la Pena Andreu.]