Hi,
I'm writing a script that will parse a HTML file containing a table of data and enter the data into a mysql database.
I'm trying to parse all the data in the cells '<td></td>' using the preg_match_all function.
// match all <td> </td> tags and get the data
preg_match_all("|<[td>]+(.*)</[td>]+>|U", $content, $regs);
$datacnt = count($regs[0]);
$databasedata=""; //combine all the data
echo "\n";
for ($i=0; $i<count($regs[0]); $i++) {
// insert the data into the database
if ($i%16 ==0) {
echo "\n";
if ($rowcount < $codeopen-1) {
echo $databasedata . "\n\n";
}
if ($rowcount > 1) --$rowcount;
$databasedata="";
}
// remove all html tags and decode html entities
$data = html_entity_decode(strip_tags($regs[1][$i]));
The problem I have is that I can't work out the correct pattern in the preg_match_all function that will pickup both '<td>' and <td align="left">'.
The code as it stands only picks up '<td>'.
I'm not that familiar with the patterns.
Any ideas
Thanks