regular expression syntax,, plz help!

Paul_help_ · Sep 30, 2013

Hi. I'm having difficulty with the code beneath.

the thing i don't understand is why there needs to be 2 backslashes in front of the dollar sign, at the beginning of the regular expression:

“/\$\d+.\d{2}/”

why not just one backslash (\$)??

the regular expression also contains ---> . <---- but this period sign doesn't need 2 back slashes. it only needs 1 backslash. can someone explain why the dollar sign at the start needs 2 backslashes, but the period sign only needs one backslash?


$text = “The wholesale price is $89.50.”;

// Displays “The wholesale price is [CENSORED].”
echo preg_replace( “/\\$\d+\.\d{2}/”, “[CENSORED]”, $text );

bradgrafelman · Sep 30, 2013

The actual value of the regexp pattern is this:

/\$\d+\.\d{2}/

To get there, you need literal backslashes in the string. However, the backslash has special meaning in double-quote delimited strings - that's why "\n" is a single character, namely a new line or "line feed" ("LF") character. To ensure that no special escape sequence is triggered, you normally use a double backslash inside such strings to indicate you want a literal backslash there (so "\n" would be two characters: a backslash followed by the letter 'n')..

However, PHP is forgiving and will automatically assume you wanted a literal backslash if the next character doesn't complete any known escape sequences. Thus, "." and "\." would yield the exact same value.

To further complicate the issue, note that you don't want variable interpolation to occur in the string, thus you need to escape the dollar sign. (Then again, a better solution might be my personal preference: only use double-quote delimited strings when variable interpolation is desired.)

NogDog · Sep 30, 2013

This is the main reason I normally write my regex string literals in PHP with single quotes around them instead of double quotes.

Paul_help_ · Sep 30, 2013

thanks.

but i still don't understand why the dollar sign needs to backslashes b4 it??

Paul_help_ · Sep 30, 2013

would it give the same result if it was just one backslash??

bradgrafelman · Sep 30, 2013

In this instance, yes, only because the character after the dollar sign isn't valid for the beginning of a variable name. If it was, then no, it wouldn't work because you'd actually need three backslashes. Example:

$text = 'The wholesale price is $a89.50.'; 

// Displays "The wholesale price is [CENSORED]." 
echo preg_replace( "/\\\$a\d+\.\d{2}/", "[CENSORED]", $text );

Using only two backslashes there would result in an "Undefined variable: a" error message since PHP would attempt to interpolate "$a" inside the string with the (undefined) variable $a.

EDIT: And as an illustration of the preferences of NogDog and myself:

$text = 'The wholesale price is $a89.50.'; 

// Displays "The wholesale price is [CENSORED]." 
echo preg_replace( '/\$a\d+\.\d{2}/', "[CENSORED]", $text );

Paul_help_ · Sep 30, 2013

thanks man.

im also analyzing this piece of code:

$text = “Author: Steinbeck, John”;
// Displays “Author: John Steinbeck”
echo preg_replace( “/(\w+), (\w+)/”, “$2 $1”, $text );

i was just wondering. are commas (,) and colons ( considered word characters ---> \w in regulars expressions?????

bradgrafelman · Sep 30, 2013

From one of my favorite regexp resources here:

Regular-Expressions.info wrote:
\w stands for "word character". It always matches the ASCII characters [A-Za-z0-9_].

EDIT: And, of course, you could always test it yourself:

$test = '1 2 A B ? ! ; , .';
preg_match_all( '/\w/', $test, $matches );
print_r( $matches );

output:

Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            [2] => A
            [3] => B
        )

)

Paul_help_ · Sep 30, 2013

but what character class do colons and commas come under? are they -----> \W (uppercase W instead of lowercase)??

bradgrafelman · Sep 30, 2013

Paul help!;11033693 wrote:
but what character class do colons and commas come under?

You're implying that a given character must be part of some "character class" (which, by the way, is not what "\w" is -- it's a shorthand way of writing the actual character class). You also didn't say "classes" (plural) as if you're implying there is only one class that could include colons and commas.

Neither are true.

Paul help!;11033693 wrote:
are they -----> \W (uppercase W instead of lowercase)??

Do you know what the upper-case variant means? (If not, go back and re-read the page I linked you to above.)

In closing, one actual character class (i.e. a left square bracket, some characters, and a right square bracket) that would include "colons and commas" would be:

[\w:,]

Paul_help_ · Oct 1, 2013

the reason why im asking if \w cover commas and colons is becoz.......

$text = “Author: Steinbeck, John”;
// Displays “Author: John Steinbeck”
echo preg_replace( “/(\w+), (\w+)/”, “$2 $1”, $text );

............. the above snippet of code outputs:

“Author: John Steinbeck”

....... and if you notice, the comma (which is originally at the end of the word 'Steinbeck') disapears.

and if i change the regular expression to:

“/(\w+): (\w+)/” <--------------- (replacing the comma with a colon.)

................. it outputs:

Steinbeck Author, John

...... as you can see, the colon disappears and the comma stays put and doesn't move position with the word 'Steinbeck' (which it was originally infront of)

i understand the code,,,but i just don't understand why the comma disappears if i include it in the reg expression (and the same with the colon) and why the comma doesn't change position with the word Steinbeck????

NogDog · Oct 1, 2013

As the manual states (my emphasis):

"A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w."

Unless your locale setting uses some language I'm unaware of that uses the comma character as a letter or digit, then no, a comma will not be included as a "word character".

johanafm · Oct 1, 2013

…and to elaborate on the commas and colons

Initially you had
1.

$text = "Author: Steinbeck, John";
$pattern = "/(\w+), (\w+)/";

Which you changed to
2.

# $text is unchanged
$text = "Author: Steinbeck, John";
$pattern = "/(\w+): (\w+)/";

And then the output from 1. and 2. above are

Author: John Steinbeck
Steinbeck Author, John

If you have a closer look at the end result, you will find that it's not only the : that disappears, but that you actually have a completely different ordering of words…

Now, to make it easier to see what happens, use preg_match instead to see what the pattern and its capturing subpatterns actually match

if (preg_match($pattern, $text, $m))
    printf('<pre>%s</pre>', print_r($m,1));

Which for cases 1. and 2. above will yield

Array
(
    [0] => Steinbeck, John
    [1] => Steinbeck
    [2] => John
)

Array
(
    [0] => Author: Steinbeck
    [1] => Author
    [2] => Steinbeck
)

And what preg_replace does is:
Inside the string $text (Author: Steinbeck, John)
Replace everything matched (the [0] element above)
With "$2 $1" (the second matched subpattern followed by the first matched subpattern - elements [2] and [1] above)

So the two cases are
1.
Inside the string "Author: Steinbeck, John"
Replace "Steinbeck, John"
with "John Steinbeck"

2.
Inside the string "Author: Steinbeck, John"
Replace "Author: Steinbeck"
with "Steinbeck Author"

regular expression syntax,, plz help!

PPaul_help_

Bbradgrafelman

NogDog

PPaul_help_

PPaul_help_

Bbradgrafelman

PPaul_help_

Bbradgrafelman

PPaul_help_

Bbradgrafelman

PPaul_help_

NogDog

Jjohanafm