Help with Regular Expression

stovellp

Howdy,

I have an HTML file that has a few entries that look like this:

<TD align=CENTER BGCOLOR=#BBFFCC><A HREF=NT01.html>something
</A></TD>

<TD align=CENTER BGCOLOR=#BBFFCC><A HREF=NT02.html>anotherthing
</A></TD>

I want to get the contents of the cell (including the link) seperate from the rest using regular expressions. This is what I have so far:

// $String is the entite HTML file, $Regs is an array.
while ( ereg( "(<TD align=CENTER BGCOLOR=#BBFFCC>)(<A.*<\/A>)(<\/TD>)", $String, $Regs) )
	{
		$String = str_replace($Regs[1].$Regs[2].$Regs[3], "", $String);	// Replace the bit we just cut out so we dont get stuck in an endless loop.

	echo 'Link and Text: '.$Regs[2].'<br />';

}

But its not working. I think theres something wrong with the characters I'm using, $Regs[1] is "<TD align=CENTER BGCOLOR=#BBFFCC>" as it should be, $Regs[2] is everything from the <A of the first link it finds, to the end of the html file (or maybe the end of its last link), which is not what it should be, and $Regs[3] is </TD> as it should be.

Thanks very much for any help 🙂

Shrike

$string = "<td moo blah blah moo>This is the only bit I want</td>";
$string2 = preg_replace("/<td.*?>(.*?)<\\/td>/si", $1, $string);
echo $string2;

Should output "This is the only bit I want"

hth