Hi again. Thanks to the people who responded to my last query, but now I have another...
I've customised the excellent Grabbing Tools (from Zend.com code gallery) to grab Press Releases titles and URLs. However, titles aren't much to search on, so is there a way of expanding these tools so I can grab a summary of the release (say the first 100 words)? Sometimes there is a summary on the page, but most times I'd need to go into the subsequent link. Ideally, I'd like to be able to treat the summary as a variable in its own right eg $out[3] or even better $summary.
Anyway, here's the rough code I'm using...
//begin add
$fp = fsockopen("www.newssite.com", 80, &$errno, &$errstr, 30);
if(!$fp) {
echo "The site is currently down<br>\n";
exit;
} else {
// end add
$file=fopen("http://www.newssite.com/press releases","r");
if (!$file) {
echo "error when connect\n";
exit;
}
while (!feof($file)) {
$line = fgets($file,1024);
$regex = "<a name=\"(.*)\" href=\"(.+)\">(.+)</A>";
if (eregi($regex, $line, &$out)){
$header = eregi_replace("'", "‘", $out[3]);
$i++;
if ($i > 30 ) { break; }
I realise I'd have to redesign the code for every page I'm grabbing summaries from, but just a kick-start would be appreciated.
The only solution I can think of currently is to download the entire linked-to page and search on that, but my storage space is limited (ain't it always! lol)
Many thanks in advance for any help or pointers.