Hi everyone,
I am currently trying some website screen scraping. Mozilla Firefox Plugins XPather and MIT Piggy Bank's Solvent give me a XPath, which seems to work within the plugins, but it doesn't work within PHP (length=0, should be 4; PHP Version 5.2.6 Win). I couldn't find any helpful tutorial for this case. Can someone see the problem?
Cheers
bluepuma
<?php
$target_url = 'http://www.google.de/intl/en/about.html';
$userAgent = 'Opera/9.00 (Windows NT 5.1; U; en)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if (!$html) {
echo "cURL error number: " .curl_errno($ch) ."<br/>\n";
echo "cURL error: " . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($target_url);
// grab data with xpath
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate('/html/body/div[@class="g-doc about"]/div[@class="g-section g-tpl-180"]/div[@class="g-unit content"]/div[@class="g-section g-tpl-50-50 contentr1"]/div[@class="g-unit g-first contentr1c1"]/p');
echo 'Evaluate Length: ' . $hrefs->length;
?>