Hi everyone !
I try to get some Web pages, and parse it with the PHP DOM extension.
My problem is that PHP-DOM seems to be confuse with anchors.
For example, get a Wikipedia page (http://fr.wikipedia.org/wiki/Linux ) and load it with DomDocument :
$snoopy = new Snoopy;
$snoopy->fetch("http://fr.wikipedia.org/wiki/Linux");
$htmltext = $snoopy->results;
$domdoc = DOMDocument::loadHtml($htmltext);
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: ID top already defined in Entity
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: ID Histoire already defined in Entity
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: ID Le_projet_GNU already defined in Entity
etc...
All these attributes are anchors : <a href="#Histoire"> , <a href="#Le_projet_GNU">, etc.
I tried to change the encoding system and some other crap-tricks ;-) but these messages are still there.
Anybody has suggestion ?