[Resolved] regex not working, why?

ansuz

Can someone help me figure something about regexes?

Say I have something like:
$str = "+ +z a"

I want to match words up to a whitespace or end of string. They may be preceeded with a +, but I do not want to match + alone. A word consists of any char except + and whitespace.

So I wanted to match +z (first valid word in string).

I tried this:
/[+]*\S+/

It matched "+". Not what I need.

I tried this:
/[+]*[^+\s]+/

I matched "+z".

It is like \S is a valid match even if it does not match anything. Why is that? Is there a way to change this behavior?

Thanks!

mtmosier

Seems to me \S was a valid match. Of "+". [+] was a valid match of nothing. ( means 0 or more)

ansuz

Originally posted by mtmosier
Seems to me \S was a valid match. Of "+". [+] was a valid match of nothing. ( means 0 or more)

You're point is quite interesting. I hadn't seen things on this angle.

What bothers me is: my quantifiers are greedy. If my string is:
"++ +z"

Then the following regex should match all pluses, then short of a \S match it should give up on the leading "++" and match on "+z". Don't you think so?
/[+]*\S+/

My program works well already, but I'd like to fine tune my regex so it does not return useless tokens.

Thanks for your help :-)

mtmosier

What bothers me is: my quantifiers are greedy. If my string is:
"++ +z"

Then the following regex should match all pluses, then short of a \S match it should give up on the leading "++" and match on "+z". Don't you think so?
/[+]*\S+/

They're greedy, but they're also persistent. When [+] matches "++", but \S+ fails to match the space, it'll go back and try matching less for [+], being only "+". At that point it matches the second "+" with \S+. It won't move on until it exhausts all possibilities.

ansuz

Originally posted by mtmosier
They're greedy, but they're also persistent. When [+] matches "++", but \S+ fails to match the space, it'll go back and try matching less for [+], being only "+". At that point it matches the second "+" with \S+. It won't move on until it exhausts all possibilities.

Your explanation makes sense. So my 'workaround' is the good solution:
/[+]*[^+\s]+/

"Matching + is good, but not as the only non whitespace char".

Thanks for helping me understand!