Hi all,
Here is a trimmed down script, originally created by Julian Bond
at www.voidstar.com - much love to you Julian 🙂
<?
if ($q) {
parse_html($q, $sort);
} else {
show_form();
}
//****************
function show_form() {
$server = getenv("SERVER_NAME");
$request = getenv("REQUEST_URI");
?>
<form action="<? print "http://" . $server . $request; ?>">
<br />Number of entries to return: <select name="num">
<option name="5">5</option>
<option name="5">10</option>
<option name="5" selected>15</option>
<option name="5">20</option>
<option name="5">25</option>
<option name="5">50</option>
<option name="5">75</option>
<option name="5">100</option>
</select>
<br />Search Query:<input type="text" name="q" size=50>
<br /><input type="submit" value="Create RSS">
</form>
<?
}
//****************
function parse_html($q, $sort=""){
header("Cache-Control: public");
$itemregexp = "%><a href=\"(.+?)\".+?>(.+?)</a><br><font size=-1><font color=#6f6f6f>(.+?)</font>(<br>|)</table>%is";
// $itemregexp = "%<a class=y href=\"(.+?)\">(.+?)<br><font size=-1><font color=#6f6f6f>(.+?)</font><br></table>%is";
$allowable_tags = "<A><B><BR><BLOCKQUOTE><CENTER><DD><DL><DT><HR><I><IMG><LI><OL><P><PRE><U><UL>";
$num = ($num) ? $num : 15 ;
$url = "http://news.google.com/news?hl=en&num=$num&q=".urlencode($q);
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_ENCODING,"");
curl_setopt ($ch, CURLOPT_HEADER, 1);
$data = curl_exec($ch);
curl_close ($ch);
header("Content-Type: text/xml");
$data = strstr($data,"<b>Sorted by");
eregi("<title>(.*)</title>", $data, $title);
$channel_title = $title[1];
$match_count = preg_match_all($itemregexp, $data, $items);
$match_count = ($match_count > 25) ? 25 : $match_count;
$output .= "<?xml version=\"1.0\" encoding=\"windows-1252\" ?>\n";
$output .= "<!-- generator=\"gnews2rss/1.0\" -->\n";
$output .= "<!DOCTYPE rss >\n";
$output .= "<rss version=\"2.0\">\n";
$output .= " <channel>\n";
$output .= " <title>G News Search: $q</title>\n";
$output .= " <link>". htmlentities($url) ."</link>\n";
$output .= " <description>G News Search: $q</description>\n";
$output .= " <webMaster>julian_bond@voidstar.com</webMaster>\n";
$output .= " <language>en-us</language>\n";
$output .= " <generator><a href=\"http://www.voidstar.com/gnews2rss.php\">GNews2Rss</a></generator>\n";
for ($i=0; $i< $match_count; $i++) {
$item_url = $items[1][$i];
$title = $items[2][$i];
$title = strip_tags($title);
$desc = $items[3][$i];
$desc = eregi_replace(" - .* ago</font><br>", "<br>", $desc);
$desc = strip_tags($desc, $allowable_tags);
$desc = htmlspecialchars($desc);
$output .= " <item>\n";
$output .= " <title>". htmlspecialchars($title) ."</title>\n";
$output .= " <link>". htmlspecialchars($item_url) ."</link>\n";
$output .= " <description>". $desc ."</description>\n";
$output .= " </item>\n";
}
$output .= " </channel>\n";
$output .= "</rss>\n";
print $output;
}
?>
... so the original script Julian wrote returned google news
results sorted by date. This tweaked version (which Julian
was gracious enough to supply) returns google news results
sorted by relevance ... the problem?, well when you
return results by relevance, you are now dealing with the
problem of images (and their text/link) being returned with a
search result. In a nutshell, this creates two issues
- The contents of each <title> tag are prefixed by the image's text
(from a result).
- The <link> tag contains the image's URL (instead of the URL
to the first item in the result).
So can anyone see how I can correct this problem?. I am somewhat of
a PHP novice and as such don't know where to begin.
Please let me know if I haven't explained the situation well enough
and I will try to elaborate.
Thanks for any help,
Reflex.