[RESOLVED] Getting textarea contents using DOMDocument

Metzen · Jun 18, 2008

Hi,

I'm trying to scrape a bunch of HTML documents, and having good success, except for getting the innerHTML / value of textareas.

$html = file_get_contents("whatever.html");
$dom = new DOMDocument();
@$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('*');
foreach($nodes as $node){
if($node->nodeName == "input" && $node->getAttribute('name') == "fname"){
echo "first_name: " . $node->getAttribute('value') . "<br>\n";
}
}

That is working great. But for a textarea there isn't a value attribute, and innerHTML doesn't work.

How do you get the contents of a textarea using DOMDocument?

Thanks!
Metzen

bradgrafelman · Jun 18, 2008

Try $node->nodeValue ?

Metzen · Jun 18, 2008

Weird. The textarea node has a childnode that is a text object, and you have to read it separately.

if($node->nodeName == "textarea" && $node->getAttribute('name') == "whatever"){
$inodes = $node->childNodes;
foreach($inodes as $inode){
echo "whatever textarea: " . $inode->substringData(0, 100000) . "<br>\n";
}
}

Of course, you can't just read the value in, as far as I can tell, so you have to get a substring that is bigger than the whole thing.

Not exactly intuitive, and definitely not documented (I just stumbled across it in my own tests), but it works.

bradgrafelman · Jun 18, 2008

If the text is appended as a childNode, you can most likely use the nodeValue on that child node instead of substringData.

Metzen · Jun 18, 2008

$node->nodeValue...

Well, yeah, if you want to make it really easy. :}

That works great. It would be nice if that were documented somewhere.

Thanks for the help.

bradgrafelman · Jun 18, 2008

Metzen wrote:
It would be nice if that were documented somewhere.

Documentation for DOMNode objects: [man]domnode[/man].

[RESOLVED] Getting textarea contents using DOMDocument

MMetzen

Bbradgrafelman

MMetzen

Bbradgrafelman

MMetzen

Bbradgrafelman