DssTrainer;10912139 wrote:Ok, I got it working with:
FIND: <img(.+)">
REPL: <img\1" />
That solution isn't entirely reliable.. consider the following example:
$str = '<img src="some/path/some/file"><a class="foo"></a>';
$str = preg_replace('#<img(.+)">#', '<img\1" />', $str);
echo $str;
If your image tag is the last tag prior to a newline, then it will be ok.. if not, you'll have problems..
What is important to understand is that quantifiers like + and * are greedy.. that is to say, they will keep trying to match the last instance of what it is looking for.. in this case, you are using dot match all (which be default matches anything other than a newline. So if you look at my example, (.+)"> will not stop at "> for the image tag, but rather will stop matching "> at the anchor tag.
So the general usage of .+ or .* is frowned upon for not only speed purposes (due to the engine needing to backtrack), but even worse, accuracy issues.
I gave a small example of backtracking explanations here... (but failed to include accuracy issues in that post, which is why I posted my small code snippet above.)
As for the proper solution to this issue, I'm not sure to be honest what it would be.. nested stuff gets ugly in a hurry.. I'm not sure if this can be done using DOMDocument() or XPath and / or not.