Downloading and parsing html, please help:)

map

Hi,

Im trying to download a url i.e. say www.google.co.uk, download the page and parse the html and search for certain info i.e. dates and such,

i.e. if i downloaded an ebay auction theres numerous dates, how would i ensure the right date is used?

any help
would be greatly appreciated

thanks

mark

BigZack

I would suggest using wget ([URL=http://]www.gnu.org/software/wget/wget.html[/URL]), sed ([URL=http://]www.cornerstonemag.com/sed/[/URL]), grep ([URL=http://]www.gnu.org/software/grep/grep.html[/URL]), and/or awk ([URL=http://]www.canberra.edu.au/~sam/whp/awk-guide.html[/URL])

If you need to use just PHP check out these functions for reading files: fgetss(), fgets(), fopen(), fsockopen(), popen(), strip_tags(), fread(), fgetc(), stream_get_line(), and socket_set_timeout().

You will also need PHP Regular Expression Functions ([URL=http://]us4.php.net/manual/en/ref.pcre.php[/URL]) for pattern matching.

Basicaly you just need to open the file, read through it line by line and pull the data out using a combination of line numbers, or matching strings.

$fp = fopen('http://www.website.com/page.html','r'); // open the file

while (!feof($fp)){ 
   $content .= fread($fp,150000); // loop through the file and adding each line to $content
} 

fclose($fp); // close the file

$exp = "blah blah blah"; // regular expresion to match whatever between this and that
$match = preg_match_all($exp,$content,$matches); // search $content for all matches to $exp and add them to array $matches   

if ($match) { 
	for($i=0;$i<=count($matches);$i++) 
		echo strip_tags($matches[0][$i]); // print each match
}