1. If all those pages are in same directory
I would use [man]glob[/man] to get all pages names in directory.
2. Then I should, in a loop, use [man]file_get_contents[/man]
to load each page one after one into a string.
3. On each such page-string I would use [man]preg_match[/man]
to find between <p class="topic_start"> and <p class="topic_footer">
This regex will NOT include tags: <p class="topic_start"> and </p>
But this is easily changed if you wish.
<?php
//REGEX Pattern we will searchfor
//The part inside parentesis we capture= (.*)
$searchfor = '#<p class="topic_start">(.*)</p>.*<p class="topic_footer">#s';
// where we will collect all results
$results = array();
//find each page with extension .html in current directory
foreach (glob('*.html') as $file) {
//read the whole page into $string
$string = file_get_contents($file);
//find the part we $searchfor
preg_match($searchfor, $string, $match);
//store each $match into results array
$results[] = trim($match[1]);
//loop to next and repeat foreach
}
//display all results
print_r($results);
exit('the end');
/* MY TESTING
$string = '<p class="topic_start">
<p class="title"> The Maths Evaluation </p>
<p class="strips"> </p>
<p class="contentlioad">
The Maths evaluation period......
...
.....
.....
</p>
</p>
<p class="topic_footer"><img src="images/footer_img.gig"></p>';
$search = '#<p class="topic_start">(.*)</p>.*<p class="topic_footer">#s';
preg_match($search, $string, $match);
print_r(trim($match[1]));
exit();
*/
?>