Hello,
I have dumped a number of RSS feeds into a mySQL table. These feeds are news RSS feeds (AP, Reuters, ABC, CNN, CBS news, digg, slashdot, etc)
I am having difficulty writing a script that displays the content of each of the feeds WITHOUT showing duplicate articles. Instead I want duplicate articles to be listed under the initial article as "Related Headlines" -- much like Google News does. The question is: How to do this?
Example:
le: Rumsfeld Resigns from Iraq.
Source: Yahoo! News Date: Dec 3, 2006
Article Body: Today, Donald Rumsfeld has resigned from Iraq. Millions of Iraqis are partying in the streets.
Related Headlines: "Bush Fires Rumsfeld" - ABC News
"Rumsfeld is No More" - CNN News
Get the idea? I am using similar_text() to compare the headlines of each article. If the similarity is >70% then the compared headline is to be removed from the array so that it isn't displayed as an independent article but will be displayed as a Related Headline for that article.
So here is my code attempt:
<?php
$query = "select id, headline, intro, body, author, date, source, vote, xmlsitetype from anews1 where xmlsitetype = 0 ORDER BY date DESC LIMIT 10";
$result = mysql_query($query);
// Go through each news item from the database table
while ($rownews = mysql_fetch_assoc($result)){
// find similar_text and flag duplicate stories to be displayed as Related Headlines
$dupquery = "SELECT id, headline, intro, body, author, date, source, vote, xmlsitetype from anews1";
echo $dupquery;
$dupresult = mysql_query($dupquery);
// strip unneeded characters like quotes out of headlines to clean them up:
$cleanheadlines = array("\"", "'");
while ($dupcheck = mysql_fetch_assoc($dupresult)){
foreach ($dupcheck as $key => $dupcheck[headline]){
$str1 = str_replace($cleanheadlines, "", addslashes($rownews[headline]));
$str2 = str_replace($cleanheadlines, "", addslashes($dupcheck[headline]));
echo "<p><b>str1</b> is: ".$str1."<br><b>str2</b> is: ". $str2."<p>";
if (similar_text( $str1, $str2, $p ) > .70){
echo '<b><u>phrases are similar</u></b>';
// flag str2 in the array so that it is not displayed as an independent article, but rather as a Related Headline
} else {
echo 'phrases not similar';
// if phrases are not similar then print this article as the next independent news article in the news list
}
echo "Percent: $p%";
}
}
?>
<ol>
<li>
<strong><a href="<?php echo $rownews[source]; ?>"><?php echo $rownews[headline]; ?></a></strong><br />
<span style="font-size:0.8em; color:#999; height: 10px;">→ <a href="<?php echo $rownews[source]; ?>"><?php echo $rownews[date]; ?></a> | <?php echo $rownews[date]; ?></span><br />
<div style="font-size:1em; color:#000; height: 130px;"><?php echo $rownews[body]; ?></div>
</li>
<?php
}
?>
</ol>
</div>
The problem with this code is that the foreach loop is printing and comparing EVERY element in the array... so that $str2 is assigned each element in the array.. I only want $str2 to be assigned the second array element (which is the headline element) so that I can then compare it with $str1.
How do I do this?