Regexping my way through table cells

nocando

Hello all!
I'm "fopening" an html-file containing a huge table that looks kinda like this:

<tr style='background-color: white;'>
    <td style='font-weight: bold;'>Value1</td>
    <td>Value2</td>
    <td style='text-align:left;'>Value3</td>
    <td style='text-align:left;'>Value4</td>
    <td class='small'>Value5</td>
    <td class='small'>Value6</td>
    <td></td>
    <td></td>
</tr>

But how do I collect the values from the table into an array?

bradgrafelman

Well, you can retrieve anything between <td></td> tags using [man]preg_match_all/man with this pattern:

$pattern = '@<td[^>]*>(.+?)</td>@s';

Note that if you want it to match <TD>, <tD>, etc., you ned to add the 'i' modifier at the end.

nocando

Ok, great thanks! 🙂
But I'm not sure how to do this (sorry), my screen only says 'Array' even though I am echoing it as an array:

<?php
$data = file_get_contents("http://www.mysite.com/");
$pattern = "@<td[^>]*>(.+?)</td>@s";
preg_match_all($pattern,$data,$match);
echo $match[0];
?>

Why is that?

bradgrafelman

Visit the manual page for [man]preg_match_all/man to learn how it fills the $match array.

To see the structure of $match, you could always do a [man]print_r/man on it.

nocando

Of course, thanks a bunch!
But when I echo the array, empty values (between <td> and </td>) seems to mess it up a bit. Whenever there's an empty value the closing td-tag and the next starting td-tag is included in the array. How do I get rid of the td-tags?

Edit. Sorry, I'm unnecessarily spamming this forum. I found it out myself.

bradgrafelman

So, did that resolve your issue? If so, don't forget to mark this thread resolved.