My apologies for the lack of information.
As you might have guessed, this is a website for a newspaper. The pages (hundreds, mind you) are in basic html format. The front page news, for example, runs all of its stories in one html doc (in contrast to having seperate docs for each story).
Each section is seperated by html comments <!-- ARTICLE 1 --> (which are, by the way, the same top and bottom -- the html comments are the same). Within those html comments are the article's headline, lead, and story information (including photos, etc).
The purpose of the php script is that it is supposed to parse this news page and generate a web page that contains a preview of each story (a headline, a thumbnail of a photo, and a lead into the story). If the user wishes to read further, they click on a "read more" link which takes them to the html page.
Traditionally, I have managed this "preview" page on my own...meaning it was a seperate html document (I simply do the copy-n-paste routein from the news page). I am trying to automate that process, and effectively eliminate the time spent on that page.
These html comments in the news page were put there to make editing easier (I work strictly by html, and won't use a wysiwyg editor). After we started heavily implementing php on the site, I realized I didn't have much in these news pages to distinguish each article, besides the html tags.
I can parse each story individually (extract the headline and story info) if I can first extract just the content between the html comments, but it becomes difficult if I'm scanning for headlines in the entire page (with all the articles).
Long story....whew! Hope that more explains my situation. I don't want to go back and change html pages, because there are hundreds of pages.
Thanks for the help thusfar.