Say I have the following html page.
<a href="http://story.myurl">News Headline</a>
<small>news source)</small>
<br>
Here is the body of the newsitem
<br>- <small><i>
Date Posted
</i></small><p>
<p>
<p><img src=http://mysite.com/myjpg.gif>
<a href="http://specialurl.mysite.com">Story I don't want</a>
<font color=red>special</font>
<small>News Source</small>
<br>
here is the body of the newsitme i don't want
<br>- <small><i>
Date Posted
</i></small>
<p>
<a href="http://story.myurl">News Headline</a>
<small>news source)</small>
<br>
Here is the body of the newsitem
<br>- <small><i>
Date Posted
</i></small><p>
<a href="http://story.myurl">News Headline</a>
<small>news source)</small>
<br>
Here is the body of the newsitem
<br>- <small><i>
Date Posted
</i></small><p>
I want to grab each headline and put each "item" (headline, source, body,and date) of the story into a variable. I can do this, no problem.....HOWEVER, see the above code , the headline that says specialurl and I Don't Want This as the headline? I don't want that headline going into a variable. Why? because that image right before it means i don't want the headline..none of the other headlines have that image. I put the entire page into a string. I was thinking there was some way I could search and replace that whole newsitem, between the image and the date and replace it with blanks...but am having no luck. Here is my php code to grab the parts I need
****************************************************************************/
//THIS WORKS FOR URL
//preg_match_all("|href=\"http://story?([^\"' >]+)|i", $html_string, $out_link);
/***************************************************************************/
/*******************************************************************/
// THIS WORKS FOR GETTING NEWSOURCE
preg_match_all("/<small>(.*)<\/small>/", $html_string, $out_newsource);
/********************************************************************/
/***************************************************************************/
// THIS WORKS FOR GETTING THE DATE
preg_match_all("/<small><i>\n(.*)\n<\/i><\/small>/", $html_string, $out_date);
/****************************************************************************/
/**********************************************************************/
//THIS WORKS FOR GETTING THE BODY
preg_match_all("/<br>\n(.*)\n<br>/", $html_string, $out_body);
/**********************************************************************/
/****************************************************************************/
//THIS WORKS FOR GETTING THE URL AND TITLE
preg_match_all("/<a href=\"http:\/\/story(.*)<\/a>/", $html_string, $out_href);
/*****************************************************************************/
any ideas?