[RESOLVED] Transforming ereg_replace into preg_replace

vexx · Sep 22, 2008

I have the following:

$nice_title = ereg_replace("[^[:alnum:]]", "-", $title);
$nice_title = ereg_replace("-+",'-',$nice_title);

I have read on the internet that preg_replace is advisable these days because it is much faster and being a newbie, couldn't figure it out how to transform those 2.

Tnx in advance

nrg_alpha · Sep 22, 2008

You can use these:

$nice_title = preg_replace('#[^a-z0-9]#i', '-', $title); 
$nice_title = preg_replace('#[-+]#','-',$nice_title);

You'll notice the # characters at either end of the pattern.. these are delimiters.. preg is part of Perl Compatible Regular Expressions and as such, require delimiters (as Perl does). You can use any non-alphanumeric, non-whitespace ASCII characters (except backslashes).
Typically, you'll see '/' characters as delimiters... but I don't like using these, as if I need to include this character in my patterns, I have to escape them like thus: '\/'. So here, I simply use the hash character.

vexx · Sep 23, 2008

thank you..one question tho..this preg_replace does the same thing as the one posted by you?

preg_replace('#[^\w\']#', '-', $title)

nrg_alpha · Sep 24, 2008

vexx;10887430 wrote:
thank you..one question tho..this preg_replace does the same thing as the one posted by you?
preg_replace('#[^\w\']#', '-', $title)

I am assuming you are comparing yours to my first example:

preg_replace('#[^a-z0-9]#i', '-', $title)

and the answer is no, it is not the same thing.

In your example:

preg_replace('#[^\w\']#', '-', $title)

what you are basically saying is:
replace anything that is not a-zA-Z0-9_ and ' (quote).

So understand that the \w (word character) encompasses a-zA-Z0-9_ (plus other characters depending on your locale). What I have done in my initial example is chose only to replace any letter or number (you'll notice the i modifier after the closing delimiter. This means case insensitive (so it will check for lowercase and uppercase letters)).

Cheers,

NRG

vexx · Sep 24, 2008

heh..tnx alot for clarifying ...2 last questions tho..what's the "i" for in the expression:

$nice_title = preg_replace('#[^a-z0-9]#i', '-', $title);

couldn't find anything related on the net regarding this

also, does this ^a-z0-9 replaces special chars such as ! ' etc? or only numbers

nrg_alpha · Sep 24, 2008

As for your first question, if you re-read my last response with regards to the i in the pattern:

you'll notice the i modifier after the closing delimiter. This means case insensitive (so it will check for lowercase and uppercase letters)

As for your second question: ^a-z0-9 only deals with the letters a thru z, and 0 thru 9. These are ranges.. so if there is no ! listed, it doesn't include it. So if you also need to replace the ! and ' character, you simply add it to the character class:

$nice_title = preg_replace('#[^a-z0-9!\']#i', '-', $title);

I would seriously recommend having a look at this.

You can also look here.

Regex is really worth going through and learning IMO. It will come in very handy when you need to use it.

Hope this all helps.

Cheers,

NRG

vexx · Sep 24, 2008

tnx alot, i appreciate your help

nrg_alpha · Sep 24, 2008

No problem.

vexx · Sep 24, 2008

made a small test now, a problem comes up and i don't know why

test title is: Test'f

if i use

$nice_title = ereg_replace("[^[:alnum:]]", "-", $title);
$nice_title = ereg_replace("-+",'-',$nice_title);

The nice_title is returned ok, test-f

if i use

$nice_title = preg_replace('#[^a-z0-9]#i', '-', $title);
$nice_title = preg_replace('#[-+]#','-',$nice_title);

nice_title returns: test--f

laserlight · Sep 24, 2008

The second line should be:

$nice_title = preg_replace('#-+#','-',$nice_title);

[-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

vexx · Sep 24, 2008

laserlight;10887537 wrote:
The second line should be:
$nice_title = preg_replace('#-+#','-',$nice_title);
[-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

tnx alot, the conversion is identical now and works as intended (hopefully)...tnx both of you!

laserlight · Sep 24, 2008

No problem, and remember to mark this thread as resolved using the thread tools

A minor point: the first line can easily be simplified slightly to:

$nice_title = preg_replace('#[^a-z\d]#i', '-', $title);

\d matches any decimal digit.

vexx · Sep 24, 2008

laserlight;10887540 wrote:
No problem, and remember to mark this thread as resolved using the thread tools

A minor point: the first line can easily be simplified slightly to:
$nice_title = preg_replace('#[^a-z\d]#i', '-', $title);
\d matches any decimal digit.

tnx for the tip...this change has any effect on speed or its just more handy without affecting performance?

also, as an excercise, wanted to confirm with you, kinda scared dunno why

$nice_title = preg_replace('#[^a-z0-9]#i', '', trim($title));
$nice_title = preg_replace('# +#','-',trim($nice_title));

is the same with this one right?

$nice_title = ereg_replace("[^[:alnum:]]", " ", trim($title));
$nice_title = ereg_replace(" +",'-',trim($nice_title));

nrg_alpha · Sep 24, 2008

Vexx, why not simply do a small sample and try it?

$nice_title = ' This is a + space. ';
echo $nice_title . '<br />';
$nice_title = preg_replace('# +#','-',trim($nice_title));
echo $nice_title;

Try manipulating the pattern (and the initial value of $nice_title) and see what the outcome is? Experimentation on small samples really reveals a lot about what it is you are trying to achieve. If you don't get the desired effect, try tweaking the pattern.

On a note related to your pattern above.. while you can use an actual blank space as a space, I tend to use \x20 instead.. just seems strange to see an actual space instead of its hexdecimal value. Should you come back to this pattern later on, it may be tricky acknowledging that you have a space in there.. but if you get into the habit of using \x20, this at least gives you heads up as a programmer that there is an use of an explicit space there. Nothing wrong with using an actual space granted.. just more clear and demonstrative IMO.

nrg_alpha · Sep 24, 2008

laserlight;10887537 wrote:
The second line should be:
$nice_title = preg_replace('#-+#','-',$nice_title);
[-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

Ah, I was assuming the OP wanted to find a dash or a + and replace with a dash (come to think of it.. it was a bad assumption as there is no need to place the dash with a dash.. ). My mistake.

laserlight · Sep 25, 2008

On a note related to your pattern above.. while you can use an actual blank space as a space, I tend to use \x20 instead.. just seems strange to see an actual space instead of its hexdecimal value. Should you come back to this pattern later on, it may be tricky acknowledging that you have a space in there.. but if you get into the habit of using \x20, this at least gives you heads up as a programmer that there is an use of an explicit space there. Nothing wrong with using an actual space granted.. just more clear and demonstrative IMO.

On the contrary, I think that using a space literal is better, since it is obviously a space. Not everyone can recall that 20 in hexadecimal is the ASCII value for a space.

nrg_alpha · Sep 25, 2008

laserlight;10887577 wrote:
On the contrary, I think that using a space literal is better, since it is obviously a space. Not everyone can recall that 20 in hexadecimal is the ASCII value for a space.

Well, this is where we differ.
I have seen patterns where a literal blank space was used...(nothing wrong with that per say) I have also seen spaces used (but not in the sense as a space character, but rather to space things out for more readabilitiy using freespacing x modifier).
To me (and this is just my opinion), it just seems odd having literal blank spaces. Memorizing \x20 is not hard. If I can do it, anyone can (this is not to say my way is right, and everyone else is wrong).

When I see the \x20, I know without a shadow of a doubt what it means...(where as some people may insert blank spaces as a way to separate subpattern elements for more readability, all the while neglecting to use the x modifier). Again, it isn't necessarily bad to use literal spacing.. more of a preference thing.. but I do find it easier using \x20 instead.. just stands out more IMO.

laserlight · Sep 25, 2008

nrg_alpha wrote:
When I see the \x20, I know without a shadow of a doubt what it means...(where as some people may insert blank spaces as a way to separate subpattern elements for more readability, all the while neglecting to use the x modifier).

That sounds like a good reason, except that it means preferring a form that requires lookup (either mentally or by referring to a table) in order to avoid a mistake because a modifier is missing. Consider the pattern: '/a+/'. It is also conceivable that the + was intended to be a literal, in which case '/a\x2b/' would have avoided this mistake entirely. This would imply that one would need to have memorised the ASCII values of various other symbols which are significant in regex pattern syntax in order to be consistent.

Admittedly, I am influenced by my C and C++ background, where we shun code like this:

char a = 32;

in favour of:

char a = ' ';

on the basis of readability.

nrg_alpha · Sep 25, 2008

laserlight;10887597 wrote:
That sounds like a good reason, except that it means preferring a form that requires lookup (either mentally or by referring to a table) in order to avoid a mistake because a modifier is missing.

Agreed, this does mean some form of mental (or table lookup) to be sure. But in my case, I only bothered to memorize one character, and that is the space. For all other characters, I don't bother. I think another part of the reasoning behind the space is because of the \s notation. Obviously, this is a shorthand character class for many types of spaces.. (spaces, tabs, newlines, carriage returns, etc..).. and since I find it strange seeing a literal space, using \x20 does not encompass any of the other possible spaces by using \s. It is more immediately clear that it is a space (and especially not one intended for use with freespacing / commenting x modifier).

laserlight;10887597 wrote:
Consider the pattern: '/a+/'. It is also conceivable that the + was intended to be a literal, in which case '/a\x2b/' would have avoided this mistake entirely. This would imply that one would need to have memorised the ASCII values of various other symbols which are significant in regex pattern syntax in order to be consistent.

Well, in my case as stated earlier, I only bothered to memorize one. In the case of using the plus as a literal instead of a +, there are options:

'/a[+]/' - since metacharacters lose their special meanings within a character class, this solves the issue of intending a + as a literal.

'/a\Q+\E/' also solves this issue, as obviously anything encapsulated within \Q \E is literal.

In my opinion, it is best to learn and understand the nuances of regular expressions and utilize those to your advantage then to start memorizing a complete slew of hex characters for the sake of literal translations (I'm not implying you don't know regex (as I know you do), just for people in general).

[RESOLVED] Transforming ereg_replace into preg_replace

Vvexx

Nnrg_alpha

Vvexx

Nnrg_alpha

Vvexx

Nnrg_alpha

Vvexx

Nnrg_alpha

Vvexx

laserlight

Vvexx

laserlight

Vvexx

Nnrg_alpha

Nnrg_alpha

laserlight

Nnrg_alpha

laserlight

Nnrg_alpha