made a small test now, a problem comes up and i don't know why

test title is: Test'f

if i use

$nice_title = ereg_replace("[^[:alnum:]]", "-", $title);
$nice_title = ereg_replace("-+",'-',$nice_title); 

The nice_title is returned ok, test-f

if i use

$nice_title = preg_replace('#[^a-z0-9]#i', '-', $title);
$nice_title = preg_replace('#[-+]#','-',$nice_title); 

nice_title returns: test--f

    The second line should be:

    $nice_title = preg_replace('#-+#','-',$nice_title);

    [-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

      laserlight;10887537 wrote:

      The second line should be:

      $nice_title = preg_replace('#-+#','-',$nice_title);

      [-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

      tnx alot, the conversion is identical now and works as intended (hopefully)...tnx both of you!

        No problem, and remember to mark this thread as resolved using the thread tools 🙂

        A minor point: the first line can easily be simplified slightly to:

        $nice_title = preg_replace('#[^a-z\d]#i', '-', $title);

        \d matches any decimal digit.

          laserlight;10887540 wrote:

          No problem, and remember to mark this thread as resolved using the thread tools 🙂

          A minor point: the first line can easily be simplified slightly to:

          $nice_title = preg_replace('#[^a-z\d]#i', '-', $title);

          \d matches any decimal digit.

          tnx for the tip...this change has any effect on speed or its just more handy without affecting performance?

          also, as an excercise, wanted to confirm with you, kinda scared dunno why 🙂

          $nice_title = preg_replace('#[^a-z0-9]#i', '', trim($title));
          $nice_title = preg_replace('# +#','-',trim($nice_title));

          is the same with this one right?

          $nice_title = ereg_replace("[^[:alnum:]]", " ", trim($title));
          $nice_title = ereg_replace(" +",'-',trim($nice_title));

            Vexx, why not simply do a small sample and try it?

            $nice_title = ' This is a + space. ';
            echo $nice_title . '<br />';
            $nice_title = preg_replace('# +#','-',trim($nice_title));
            echo $nice_title;
            

            Try manipulating the pattern (and the initial value of $nice_title) and see what the outcome is? Experimentation on small samples really reveals a lot about what it is you are trying to achieve. If you don't get the desired effect, try tweaking the pattern.

            On a note related to your pattern above.. while you can use an actual blank space as a space, I tend to use \x20 instead.. just seems strange to see an actual space instead of its hexdecimal value. Should you come back to this pattern later on, it may be tricky acknowledging that you have a space in there.. but if you get into the habit of using \x20, this at least gives you heads up as a programmer that there is an use of an explicit space there. Nothing wrong with using an actual space granted.. just more clear and demonstrative IMO.

              laserlight;10887537 wrote:

              The second line should be:

              $nice_title = preg_replace('#-+#','-',$nice_title);

              [-+] specifies a character class that matches '-' and '+', but you actually want to match one or more '-'.

              Ah, I was assuming the OP wanted to find a dash or a + and replace with a dash (come to think of it.. it was a bad assumption as there is no need to place the dash with a dash.. ). My mistake.

                On a note related to your pattern above.. while you can use an actual blank space as a space, I tend to use \x20 instead.. just seems strange to see an actual space instead of its hexdecimal value. Should you come back to this pattern later on, it may be tricky acknowledging that you have a space in there.. but if you get into the habit of using \x20, this at least gives you heads up as a programmer that there is an use of an explicit space there. Nothing wrong with using an actual space granted.. just more clear and demonstrative IMO.

                On the contrary, I think that using a space literal is better, since it is obviously a space. Not everyone can recall that 20 in hexadecimal is the ASCII value for a space.

                  laserlight;10887577 wrote:

                  On the contrary, I think that using a space literal is better, since it is obviously a space. Not everyone can recall that 20 in hexadecimal is the ASCII value for a space.

                  Well, this is where we differ.
                  I have seen patterns where a literal blank space was used...(nothing wrong with that per say) I have also seen spaces used (but not in the sense as a space character, but rather to space things out for more readabilitiy using freespacing x modifier).
                  To me (and this is just my opinion), it just seems odd having literal blank spaces. Memorizing \x20 is not hard. If I can do it, anyone can (this is not to say my way is right, and everyone else is wrong).

                  When I see the \x20, I know without a shadow of a doubt what it means...(where as some people may insert blank spaces as a way to separate subpattern elements for more readability, all the while neglecting to use the x modifier). Again, it isn't necessarily bad to use literal spacing.. more of a preference thing.. but I do find it easier using \x20 instead.. just stands out more IMO.

                    nrg_alpha wrote:

                    When I see the \x20, I know without a shadow of a doubt what it means...(where as some people may insert blank spaces as a way to separate subpattern elements for more readability, all the while neglecting to use the x modifier).

                    That sounds like a good reason, except that it means preferring a form that requires lookup (either mentally or by referring to a table) in order to avoid a mistake because a modifier is missing. Consider the pattern: '/a+/'. It is also conceivable that the + was intended to be a literal, in which case '/a\x2b/' would have avoided this mistake entirely. This would imply that one would need to have memorised the ASCII values of various other symbols which are significant in regex pattern syntax in order to be consistent.

                    Admittedly, I am influenced by my C and C++ background, where we shun code like this:

                    char a = 32;

                    in favour of:

                    char a = ' ';

                    on the basis of readability.

                      laserlight;10887597 wrote:

                      That sounds like a good reason, except that it means preferring a form that requires lookup (either mentally or by referring to a table) in order to avoid a mistake because a modifier is missing.

                      Agreed, this does mean some form of mental (or table lookup) to be sure. But in my case, I only bothered to memorize one character, and that is the space. For all other characters, I don't bother. I think another part of the reasoning behind the space is because of the \s notation. Obviously, this is a shorthand character class for many types of spaces.. (spaces, tabs, newlines, carriage returns, etc..).. and since I find it strange seeing a literal space, using \x20 does not encompass any of the other possible spaces by using \s. It is more immediately clear that it is a space (and especially not one intended for use with freespacing / commenting x modifier).

                      laserlight;10887597 wrote:

                      Consider the pattern: '/a+/'. It is also conceivable that the + was intended to be a literal, in which case '/a\x2b/' would have avoided this mistake entirely. This would imply that one would need to have memorised the ASCII values of various other symbols which are significant in regex pattern syntax in order to be consistent.

                      Well, in my case as stated earlier, I only bothered to memorize one. In the case of using the plus as a literal instead of a +, there are options:

                      '/a[+]/' - since metacharacters lose their special meanings within a character class, this solves the issue of intending a + as a literal.

                      '/a\Q+\E/' also solves this issue, as obviously anything encapsulated within \Q \E is literal.

                      In my opinion, it is best to learn and understand the nuances of regular expressions and utilize those to your advantage then to start memorizing a complete slew of hex characters for the sake of literal translations (I'm not implying you don't know regex (as I know you do), just for people in general).

                        Write a Reply...