Finally I understand the problem. I'm being very slow today.
The regexp engine is eager as well as greedy, and it's the eagerness that's getting in the way. Once it finds a match it won't look to see if there are any more. Only if it finds more than one match starting from the same position does it need to choose which one to return, and that's when greediness comes into play. It has no reason to keep searching after that (in case there's a loner one to be found further in).
So there's that, and there's the fact that regular expressions can't count.
preg_match_all() came to mind, but I can think of cases where it would fail, due to patterns overlapping.
The following may not be ideal for your purposes. You mention wanting to be able to find the longest, then the next longest, etc. But it will find the longest. That will give you an upper limit on how long matches can be. If it finds a 14-character match, you can see how to then embark on a 13-character match, and so on.
Oh, and (due to eagerness) it finds the first match of the longest possible length. But then, how many matches of the form "e…e" would there be in a typical block of English text?
The text being searched is in $string.
$strlen = strlen($string);
$longest_match='';
$longest_matchlen=0;
for($i=0; $i<$strlen; $i++)
{
if($i+2*$longest_matchlen>$strlen) break;
$maxgap = strlen($string)-$i-2*$longest_matchlen;
if(preg_match('/^.{'.$i.'}(.+).{0,'.$maxgap.'}\\1/s', $string, $match))
{
if($longest_matchlen<strlen($match[1])
{
$longest_matchlen = strlen($match[1]);
$longest_match = $match[1];
}
}
}
The '(.+)' might be expanded to '(.{'.$longest_matchlen.',}.+)' so that one it matches a 5-character dupe, it won't be satisfied until it matches a 6-character dupe. Then the next test on match lengths would not be needed.
That's something to look into, anyway.
So is the possibility of getting creative with lookahead assertions. Oh, and preg_match() has had an offset parameter since 4.3.3. That would allow dropping the ".{$i}" business.