I'm trying to make an app that will harvest news from various sites and convert it into a specialist format.
For each site in my config file I strip the tags to a minimum. The following is an example of the string I'm working with:
$this->site="<a href=\"http://www.ananova.com./story/sm_771305.html?menu=news.latestheadlines\" title=\"Hopes of topping 4000 barrier dashed\">Hopes of topping 4000 barrier dashed</a><br/>London's FTSE 100 Index has closed more than 1% lower as investors took profits after a three-day upward run.<br/><i>18:00 Wednesday 16th April 2003</i>
<a href=\"http://news.bbc.co.uk/low/english/pda/front_page/newsid_2951000/2951817.htm\"><b>Huntley denies Soham murders</b></a><br/>
<a href=\"http://www.theregister.co.uk/content/35/30306.html\">Liberation abroad, repression at home - a student's tale</a> <b>Letter of the Week</b> Don't drink and share music 16 April 2003 7:47pm <br /><a href=\"http://www.theregister.co.uk/content/22/30303.html\">Freeserve and Oftel both claim victory in key competition dispute</a> ";
I am currently running this preg match against it with some odd results:
preg_match_all ('/<a.*href="(.*)"\>(.*)<\/a>/', $this->site, $output);
for ($i=0;$i<count($output[1]);$i++)
{
$this->display_block .= "\n";
$this->display_block .= "<br>";
$this->display_block .= "<br>LINK:<br>".$output[1][$i]."<br>HEADLINE:<BR>".$output[2][$i];
}
print $this->display_block;
For the Ananova link I get:
$output[1][$i] as http://www.ananova.com./story/sm_771305.html?menu=news.latestheadlines\" title="Hopes of topping 4000 barrier dashed
For BBC all seems fine.
For theRegister links I ONLY get the last url and headline, no matter whether I delete the present last url for the Register, it's always the last one.
Your help would be much appreciated.
Many thanks
emdee