Hi all,

I am new to DOMDocument, I come up this code to extract content from a website:

<?php                                                                           

$file  = file_get_contents('http://bbs.wenxuecity.com/cooking');                    

$html = new DOMDocument();                                                      
$html->loadHtml( $file );
$xpath = new DOMXPath( $html ); //working query, that proofs the DOMXPath object created correctly. $links = $xpath->query( "//a" ); //not working query, I can't figure out why as the syntax works fine in Firebug //$links = $xpath->query( "//a[contains(@class,'post')]" ); header('Content-type: text/html; charset=utf-8');
foreach ( $links as $link )
{
echo $link->textContent, "<br />";
} ?>

My question is: the first query works just fine, but the second (commented out) just won't return anything.

Am I missing something here?

Thanks

    Okay, I figured out where the problem is: file_get_contents does not get the same content as a regular browser sees. I have tried a few alternative code by cURL, still didn't solve the problem, I simply can't get the content of <div id="module"> node in that page by any PHP method.

    Also tried is to ini_set the browser agent.

    Help please;-)

      [man]file_get_contents/man is going to make an HTTP request to the remote webserver, just as a web browser will do.

        Write a Reply...