I'm using the preg_match function in PHP and I want to grab a link tag
from a page and the preceding and following text to establish a
context.
If the link is inside a set of <div> tags I only want to grab the text
within that div, if there are no <div>s I want to grab the text all
the way to the <body> tags.
To give an example if the html is:
[FONT="Courier New"]<html>
<head>
</head>
<body>
aaa aaa aaa
<div>
bbb bbb bbb
<a href='http://www.domain.com/index,html'>link text</a>
ccc ccc ccc
</div>
ddd ddd ddd
</body>
</html>[/FONT]
I want to match three groups
1: [FONT="Courier New"]bbb bbb bbb[/FONT]
2: [FONT="Courier New"]<a href='http://www.domain.com/index,html'>link text</a>[/FONT]
3: [FONT="Courier New"]ccc ccc ccc[/FONT]
but on the other hand if the divs weren't there and the html was
[FONT="Courier New"]<html>
<head>
</head>
<body>
aaa aaa aaa
bbb bbb bbb
<a href='http://www.domain.com/index,html'>link text</a>
ccc ccc ccc
ddd ddd ddd
</body>
</html>[/FONT]
I'd want to match
1: [FONT="Courier New"]aaa aaa aaa bbb bbb bbb[/FONT]
2: [FONT="Courier New"]<a href='http://www.domain.com/index,html'>link text</a>[/FONT]
3: [FONT="Courier New"]ccc ccc ccc ddd ddd ddd[/FONT]
The expression I'm working with is
[FONT="Courier New"]#.<(?:div|body).?>(.?)(<a\s[>]?href\s?=\s?["']{0,1}http://www.domain.com/index.html['"]{0,1}.?>.?</a>)(.*?)</(?:div|body)#i[/FONT]
Which is nearly there because it works as expected in the Rad Software
Regular Expression Designer (http://www.radsoftware.com.au/
regexdesigner/) and in the similar Expresso tool (http://
www.ultrapico.com/ExpressoBeta.htm) but returns no matches when I use
it in PHP.
I guess this means something is not implemented the same way in PHP
but what? Does anyone have a work around to get the expression working
in PHP?