Hi all,
I want to do the following: replace all text outside of any html tags but not anywhere between the <a .... </a> tags. So replace all text and NOT any links.
Let me give you an example of what I mean.
Say I have the following text:
----(before)-----
<html><body>
<div class=word>
<a href="word">stuff word stuff</a>
<img src="word">
<p><b>stuff stuff word more stuff</b></p>
</div>
</body></html>
So the resulting text from above would be this when replacing "word" with "NEW_WORD":
----(after)-------
<html><body>
<div class=word>
<a href="word">stuff word stuff</a>
<img src="word">
<p><b>stuff stuff NEW_WORD more stuff</b></p>
</div>
</body></html>
I have tried regular expressions and have failed, as nested tags get ignored. Let me know if anyone has any better ideas than mine, or if you have a regular expression that does work correctly.
I came up with an algorithm that may be too inefficient:
- Substitute all <a .. </a> with unique symbols (e.g. @$i@ where $i = 0..n instances) and index them.
- Do a preg_replace on everything that isn't inside <...>.
- Replace the symbols back with the links that were substituted in #1.
Thanks,
ionSphere