Hi,

I'm trying to scrape a bunch of HTML documents, and having good success, except for getting the innerHTML / value of textareas.

$html = file_get_contents("whatever.html");
$dom = new DOMDocument();
@$dom->loadHTML($html);
$nodes = $dom->getElementsByTagName('*');
foreach($nodes as $node){
if($node->nodeName == "input" && $node->getAttribute('name') == "fname"){
echo "first_name: " . $node->getAttribute('value') . "<br>\n";
}
}

That is working great. But for a textarea there isn't a value attribute, and innerHTML doesn't work.

How do you get the contents of a textarea using DOMDocument?

Thanks!
Metzen

    Weird. The textarea node has a childnode that is a text object, and you have to read it separately.

    if($node->nodeName == "textarea" && $node->getAttribute('name') == "whatever"){
    $inodes = $node->childNodes;
    foreach($inodes as $inode){
    echo "whatever textarea: " . $inode->substringData(0, 100000) . "<br>\n";
    }
    }

    Of course, you can't just read the value in, as far as I can tell, so you have to get a substring that is bigger than the whole thing.

    Not exactly intuitive, and definitely not documented (I just stumbled across it in my own tests), but it works.

      If the text is appended as a childNode, you can most likely use the nodeValue on that child node instead of substringData.

        $node->nodeValue...

        Well, yeah, if you want to make it really easy. :}

        That works great. It would be nice if that were documented somewhere.

        Thanks for the help.

          Metzen wrote:

          It would be nice if that were documented somewhere.

          Documentation for DOMNode objects: [man]domnode[/man].

            Write a Reply...