I am building a news extractor script to practice content grabbing techniques, but have a question about my code's strange activity. When this script is run the headlines are parsed out correctly, but are looped and printed a number of times for each headline. Can anyone explain to me why the headlines are repeatedly printed instead of just once? Here is my code:
<?php
function getSlashdotNews() {
$oldHeadline = "";
$connection = fopen("http://www.slashdot.org", "r");
while(!feof($connection)) {
$read = fgets($connection, 1024);
$search = eregi("(FACE=\"arial,helvetica\" SIZE=\"4\" COLOR=\"#FFFFFF\"><B>)(.*)(</B></FONT></TD>)", $read, $matches);
if(eregi("<A HREF=\"(.*)\"><FONT COLOR=\"#FFFFFF\">(.*)</FONT></A>: (.*)", $matches[2], $linkRem)) {
$newHeadline = $linkRem[3];
} else {
$newHeadline = $matches[2];
}
//if(!empty($newHeadline) && $oldHeadline != $newHeadline) {
print("$newHeadline<br>\n");
// $oldHeadline = $newHeadline;
//}
}
fclose($connection);
}
getSlashdotNews();
?>