I recently wrote a function for a website that replaces certain keywords in blocks of text with links to the pages that have more details on those keywords. The website consists of a blog, news articles, etc. so the function consists of nothing more then a string of text as the input with a series of str_replace calls inside.
I'm running into an issue where some of the text is HTML and should not be replaced. For example, I have a table that has a summary tag inside and some of the text in the summary is being replaced with a link. This obviously creates malformed HTML output.
What I'm having some trouble with is that certain text inside the HTML I want to replace. For example, text inside of a cell block in the table (between <td></td> tags) is replaceable since this is visible to the user, but text inside the summary="" of the actual table call (i.e. <table width="550" summary="text here">) should not be replaced.
I was wondering if anyone knew or could write a regular expression or something that could tell the difference and replace where appropriate. I'm thinking the <> could be used as the quantifier, meaning nothing between those tags would be replaced. It may be better to use the quotes or even both in conjunction. Any ideas on this would be greatly appreciated.