So I have a chunk of HTML that looks roughly like this:
<table>
<tr>
<td>
<font size="-1">
Blah blah <b>BLAH1</b> <i>blah2 blah2</i> <br/>
<span> BLAH3BLAH3 </span> <span> MOREBLAH4 </span>
</font>
</td>
</tr>
</table>
And I'm trying to use Domxpath() and an XPath expression to get all of the contents of the font node before the <br/> tag.
I am very easily able to get all the contents of the font node with //td/font - this just works, and then I just dump the nodes out to a string with the following function:
....
$xpath = new Domxpath($doc);
$elementNodes = $xpath->query($expression);
//convert these DOM elements to strings for mixed-mode searching
$result = array();
print_r($elementNodes);
foreach($elementNodes as $elementNode){
$string = dumpXML($elementNode);
echo("s:".$string."\n\n");
$result[] = $string;
}
....
function dumpXML($node){
$output = '';
$owner_document = $node->ownerDocument;
if($node->childNodes) foreach ($node->childNodes as $el){
if($owner_document) $output .= $owner_document->saveXML($el);
//else print_r($node);
} else {
if($owner_document) $output .= $owner_document->saveXML($node);
}
//else print_r($node);
return $output;
}
I try to do the same thing with //td/font/br/preceding-sibling::* and it gives me "BLAH1" in the above example (i.e. the contents of the first child node of the font node, at least that's how I interpret it).
What I want to get is "Blah blah <b>BLAH1</b> <i>blah2 blah2</i>" (or rather, the text node, bold node, italic node so that they get iterated over and all output into the string by dumpXML).
If I try //td/font/br/preceding-sibling::text() it will give me "Blah blah", but won't give me the contents of the <b> and <i> chunks.
This is driving me crazy.
I can't just hack in a regexp or something else since this is part of a framework, and everything works perfectly in every other instance - I really just need an XPath expression that will get everything in the font chunk that comes before the br tag. Any help would be greatly appreciated.
EDIT: Nevermind, I figured out that I can get ::nodes() to do roughly what I want. Still not sure why ::* didn't do it though.