They've become somewhat of a defacto standard which many languages now use.
[/list=a]
You will probably need to read the [man]PCRE[/man] section (and related function entries) of the php manual to fully understand what is happening. Pay special attention to [man]preg_match_all()[/man]
The regex we will use is
/\\b(.*a.*)\\b/U
The two slashes (/) (at the begining and at the end) are delimeters showing the start and end of the regular expression. The U after the last slash s a modifier which tells the regex engine that inverts the "greediness". This means that a star () plues (+) or min/max quantifier ({a,b}) will match as little as possible rather than as much as possible. Right, into the regex. \b means a word boundary, . means "match any number of any character" and the parenthesis ( () ) around the middle part of the regex are there to capture this part into the matches variable.
Let's see how this would be used.
<?
$str = "black cat sat on the mat";
$regex = '/\b(.*a.*)\b/U';
if(preg_match_all($regex, $str, $matches)) {
for($i=0; $i<count($matches[1]); $i++) {
echo($matches[1][$i]."<br />\n");
}
} else {
echo("NOT found");
}
Now, when I tested this it returned
black
cat
sat
on the mat
The reason this happened is due to the nature of PCRE so I'll have to delve a little deeper. Matches are mate sequentially, first it matches black, then cat, then sat and then it hit's a word boundary which is the first part of this regex then every charachter up to the a is a valid dot (.) so they match, then "a" allowed the string to pass the condition the other letters were caught by the other dot and then the end of the string is considered a word bouindary so this matches .... <bubble's brain ticks for minute> .... hang on, if it's set to ungreedy shouldn't it backtrack and repeat untill it finds the smallest possible result? If there are any regex gurus out there please enlighten me on this point. Well :o anyway, this does go to illustrate that the first regex you come up with does not always solve the problem (like pretty much everything in programming or even in life really but I'm not about to give you a lesson in accepting failure and carrying on, don't worry 😉). As you become more proficient with regular expressions you'll learn different ways of tackling the same problem. I took another look at the regex and realized that it could be done far more elegantly anyway with the following regex.
/\w*a\w*/
All the other concepts have already been covered except two. Firstly the \w denotes any word character. Secondly there are no parenthesis because none are needed, this regex just matches what is needed and we can therefore retrieve just the regex using $matches[0] rather than the parenthesised section using $matches[1]. Here is how it would be implemented.
<?
$str = "black cat sat on the mat";
$regex = '/\w*a\w*/';
if(preg_match_all($regex, $str, $matches)) {
for($i=0; $i<count($matches[0]); $i++) {
echo($matches[0][$i]."<br />\n");
}
} else {
echo("NOT found");
}
?>
Sorry if I've rambled on too long.
If anyone can spot why the first regex didn't work (even though it was far ess elegant) please share.
Cheers
Bubble