Are there any known issues with preg, or more specifically preg_match_all?
I had a regular expression that would find all tags of certain formats. The problem is that it worked as it should sometimes and that it would fail at other times. Additionally, when it failed, the subject string was longer and by looking at a substring of no more than 600 characters it would work. Coincidence?
I resorted to not finding all various tagnames in a general manner using a back reference and instead went through the subject once per tagname I needed to find.
Patterns to find:
A) [tagname]inner text[/tagname]
😎 [tagname tagtext]inner text[/tagname]
Do note that the two following patterns differ on capturing subpatterns as I realized I needed the text between the tags captured for later use without the tags. Other than that, I had expected the two versions to do the same job.
Pattern (1):
General pattern that did not work as I expected:
preg_match_all(
"/(\[([^]]*)|(?:([^\x20]*)[^]]*)\])" . ".*?" . "\[\/\\2\]/suD",
$str, $matches);
Pattern (2):
Tagname specific pattern that works
$findThis = array("topheadline", "headline", "tag", "ingress",
"byline", "articlelink" );
for ($i = 0; $i < count($findThis); ++$i) {
preg_match_all(
"/\[" . $findThis[$i] . "(?:[ ][^]]*)?\](.*?)\[\/". $findThis[$i] . "\]/sDu",
$str, $matches[$findThis[$i]]);
}
Pattern (1) also never failed if I replaced $str with substr($str, 0, 600). Does this mean that there are limitations on subject string length under certain circumstances? Or is that just pure coincidence?
And finally, Pattern (2) performs much faster than Pattern(1) did. Any idea why?
Thankful for any input.
regards
johan