Hello people,
I'm at that stage when i urgently need to use XML, i've done smthing but it's been a struggle for me to understand how to overcome nested tags problem. I've got this code from this site and trying to implement it on my server, - problems 🙁
This is my XML file (contractor.xml). The tags are described:
line - while parsing- print a line
b - make the text BOLD
t - insert tab symbol
linefeed - empty line
Parameters inside the tags specify how many times particular action should be repeated..
<?xml version='1.0'?>
<document>
<page number="1">
<line><b>THIS AGREEMENT</b> is made after the parties have considered
the following</line>
<line>circumstances which are material in the construction of this
Agreement.</line>
<linefeed line="1"></linefeed>
<line>A.<t num="1"></t>{name of employer} operates a {type of
business}</line>
<line><t num="1"></t>business within the City of {address of employer –
line3}.</line>
<linefeed line="1"></linefeed>
<line>B.<t num="1"></t>The contractor desires to join {name of
employer}</line>
<line><t num="1"></t>according to the terms of this Agreement.</line>
<linefeed line="1"></linefeed>
<line><b>THE PARTIES AGREE AS FOLLOWS:</b></line>
</page>
</document>
PHP source:
the global array $attribs, contains the parameters of tags.
<?php
/ XML PARSER BODY AND SETUP /
$tabCharacter=" ";
$open_tags= array(
'DOCUMENT'=>'<document>',
'PAGE'=>'<page>',
'LINE'=>'<line>',
'B'=>'<b>',
'LINEFEED'=>'<linefeed>',
'T'=>'<t>'
);
$close_tags= array(
'DOCUMENT'=>'</document>',
'PAGE'=>'</page>',
'LINE'=>'</line>',
'B'=>'</b>',
'LINEFEED'=>'</linefeed>',
'T'=>'</t>'
);
//parsing opening tags
function startElement($parser, $name, $attr){
global $open_tags, $temp, $current_tag;
global $attribs;
$current_tag = $name;
if ($format = $open_tags[$name]){
switch($name){
case 'PAGE':
$attribs["PAGE"]=$attr["NUMBER"]; //get page number
echo "Page started, page num=$attribs[PAGE]<br>";
break;
case 'LINE':
break;
case 'B':
echo "<b>";
break;
case 'LINEFEED':
$attribs["LINE"]=$attr["LINE"]; // how many linefeeds
break;
case 'T':
$attribs["TAB"]=$attr["NUM"]; //how many tabs
break;
default:
break;
}
}
}
//parsing closingtags
function endElement($parser, $name, $attr=''){
global $close_tags, $temp, $current_tag;
global $attribs;
if ($format = $close_tags[$name]){
switch($name){
case 'PAGE':
echo "Page finished, page num=$attribs[PAGE]<br>";
break;
case 'B':
echo "</b>";
break;
default:
break;
}
}
}
//parsing text between tags
function characterData($parser, $data){
global $current_tag, $temp, $catID;
global $pdf, $attribs;
switch($current_tag){
case 'LINE':
echo $data."<br>";
$current_tag = '';
break;
case 'B':
echo $data;
$current_tag = '';
break;
case 'T':
for ($i=0;$i<$attribs["TAB"];$i++){
echo " "; //thats the way to show tab symbol :)
}
$current_tag = '';
break;
case 'LINEFEED':
for ($i=0;$i<$attribs["LINE"];$i++){
echo "<br>";
}
$current_tag = '';
break;
default:
break;
}
}
$xml_file = 'contractor.xml';
$type = 'UTF-8';
// create parser
$xml_parser = xml_parser_create($type);
xml_parser_set_option($xml_parser, XML_OPTION_CASE_FOLDING, true);
xml_parser_set_option($xml_parser, XML_OPTION_TARGET_ENCODING, $type);
xml_set_element_handler($xml_parser, 'startElement','endElement');
xml_set_character_data_handler($xml_parser, 'characterData');
if (!($fp = fopen($xml_file, 'r'))) {
die("Could not open $xml_file for parsing!\n");
}
// loop through the file and parse baby!
while ($data = fread($fp, 4096)) {
if (!($data = utf8_encode($data))) {
echo 'ERROR'."\n";
}
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf( "XML error: %s at line %d\n\n",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
The resulting output is:
Page started, page num=1<br>
<b>THIS AGREEMENT</b>circumstances which are material in the
construction of this Agreement.<br>
<br>
A.<br> <br>
B.<br> <br>
<b>THE PARTIES AGREE AS FOLLOWS:</b>Page finished, pagenum=1<br>
But, that line i've marked up with >>> contains not that text!
It should contain: 'is made after the parties have considered the
following'
If you go through the entire XML file,then it's quite clear, that this PHP script does not parse that text, which follows the nested tags!
what i mean is that line:
<line><b>THIS AGREEMENT</b> is made after the parties have considered
the following</line>
will be parsed only like:
<b>THIS AGREEMENT</b><br>
and all the text which is behind the </b>tag is thrown away. 🙁
If i change this line to:
<line><b>THIS AGREEMENT</b> is made after the parties <b>have</b>
considered the following</line>
the result will be:
<b>THIS AGREEMENT</b><b>have</b><br>
Once again: text itself is disappeared 🙁((
Gosh i spend the whole day today to determine the problem, and have not found any solution yet.
I would appreciate any posts relating this matter, thank you,
Zboris.