I am using this as a guide to how screen scraping works:
http://www.techdose.com/tutorials/php/eBayParsing/
with the full source on this page for the tutorial script:
http://www.techdose.com/tutorials/php/eBayParsing/ebayParsing3.php
Now i tryed to replicate it for mininova but im stumped as to how the script actually gets the information in the different tables, heres my modded script
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>alpha script</title>
</head>
<body>
<?
/*******************************************************************
Filename: mininovaparser.php
Description:
This script is to demonstrate a technique to parse mininova auctions.
Author: Wayne Eggert
Last Updated: December 2004
*******************************************************************/
$URL = "http://www.mininova.org/search/?search=".linux."";
$file = fopen("$URL", "r");
$r = "";
do{
$data = fread($file, 8192);
$r .= $data;
}
while(strlen($data) != 0);
$mininovaTABLEArray = preg_split ("/<table.*?>/", $r);
// now try to find which <table> contains the search results are
for($x=0; $x<count($mininovaTABLEArray); $x++){
if(strstr($mininovaTABLEArray[$x],"Leechers")){ // this is text
$resultTable = $x+1;
}
}
$mininovaTRArray = preg_split("/<tr.*?>/",$mininovaTABLEArray[$resultTable]);
echo "<BR><B>Mininova Results:</B><BR><BR>";
$start=2;
$end = $start + count($mininovaTRArray);
for($i=$start;$i<$end;$i++){
$mininovaTDArray = preg_split ("/<td.*?>/",$mininovaTRArray[$i]);
//print_r($mininovaTDArray);
preg_match("/<a.*?<\/a>/",$mininovaTDArray[4],$match);
$torrent = strip_tags($match[0],"<A></A>");
if($torrent!=""){
$torrent_name = $torrent;
$price = strip_tags($mininovaTDArray[6]);
// see if there's a Buy-It-Now price
$priceArray = explode("$",$price);
$torrent_price = $priceArray[1];
$torrent_bin = $priceArray[1];
$torrent_bids = strip_tags($mininovaTDArray[7]);
$torrent_timeleft = strip_tags($mininovaTDArray[8]);
echo $torrent_name." ".$torrent_price." ".$torrent_bids." ".$torrent_timeleft."<BR>";
}
}
?>
</body>
</html>
Its almost exactly the same as the tutorial, but i dont know how to make it distinguish between the tables, and the tutorial doesnt go into depth about that, so could anyone give me a nice in depth explaination to how it does it, and give me a bit of a hand with the script?