Can someone help me figure something about regexes?

Say I have something like:
$str = "+ +z a"

I want to match words up to a whitespace or end of string. They may be preceeded with a +, but I do not want to match + alone. A word consists of any char except + and whitespace.

So I wanted to match +z (first valid word in string).

I tried this:
/[+]*\S+/

It matched "+". Not what I need.

I tried this:
/[+]*[+\s]+/

I matched "+z".

It is like \S is a valid match even if it does not match anything. Why is that? Is there a way to change this behavior?

Thanks!

    Seems to me \S was a valid match. Of "+". [+] was a valid match of nothing. ( means 0 or more)

      Originally posted by mtmosier
      Seems to me \S was a valid match. Of "+". [+] was a valid match of nothing. ( means 0 or more)

      You're point is quite interesting. I hadn't seen things on this angle.

      What bothers me is: my quantifiers are greedy. If my string is:
      "++ +z"

      Then the following regex should match all pluses, then short of a \S match it should give up on the leading "++" and match on "+z". Don't you think so?
      /[+]*\S+/

      My program works well already, but I'd like to fine tune my regex so it does not return useless tokens.

      Thanks for your help :-)

        What bothers me is: my quantifiers are greedy. If my string is:
        "++ +z"

        Then the following regex should match all pluses, then short of a \S match it should give up on the leading "++" and match on "+z". Don't you think so?
        /[+]*\S+/

        They're greedy, but they're also persistent. When [+] matches "++", but \S+ fails to match the space, it'll go back and try matching less for [+], being only "+". At that point it matches the second "+" with \S+. It won't move on until it exhausts all possibilities.

          Originally posted by mtmosier
          They're greedy, but they're also persistent. When [+] matches "++", but \S+ fails to match the space, it'll go back and try matching less for [+], being only "+". At that point it matches the second "+" with \S+. It won't move on until it exhausts all possibilities.

          Your explanation makes sense. So my 'workaround' is the good solution:
          /[+]*[+\s]+/

          "Matching + is good, but not as the only non whitespace char".

          Thanks for helping me understand!

            Write a Reply...