RegExp: how to exclude a pattern from the search string

marcnyc · Dec 17, 2002

Ok, so I have learned how to look for a case insensitive pattern in the search string of a search and replace preg_replace function:

$interview = preg_replace('/(pattern1|pattern2)/i','pattern_new',$interview);

Can anybody tell me how to exclude a pattern3 (for example) from the search string? In other words how can I specify what to look for and what not to look for? I want it to look for pattern1 and pattern2 and replace pattern_new to pattern1 and pattern2 but if it finds pattern3 I want it NOT to replace pattern_new

Thanks

no1uknowdfw · Dec 17, 2002

How about this...

Build an "If Else" statement above this.

Save your pattern in a common variable.

preg_replace the common variable...

128kb_com · Dec 17, 2002

Try playing with the ^ meta-character, its used for negation. You can combine it with other meta-characters...

http://www.phpbuilder.com/board/newreply.php?s=&action=newreply&threadid=10221442

marcnyc · Dec 17, 2002

can you tell me how to use ^ because I have tried something in that direction myself but I was probably doing it wrong because it didn't work:

$interview = preg_replace('/(pattern1|pattern2)^{(pattern3|pattern4)/i','pattern_new',$interview);}

Can you tell me what the rigth syntax is?

128kb_com · Dec 17, 2002

I'm sorry, I should have been more specific.
The ^ is used for negation only in a character class []. Depending on pattern3, it may not work as expected. Post your patterns 1,2,3 and 4 so we can see what your working with.

marcnyc · Dec 17, 2002

no problem, I thank you for your kind availability and you promt replies...

my patterns are pretty straight-forward and pretty much just text...

The name of my site is Chain D.L.K. and I simply wanna replace any instance of:

with a link to the homepage.
So I did this:

The only problem is that if a text already contains a link in html format (such as for example <a href="http://www.chaindlk.net/reviews/">Reviews</a>) it will be converted into an ugly thing like this:

<a href="http://www.
<a href="http://www.chaindlk.com" target="_top">Chain D.L.K.</a>.net/reviews/">Reviews</a>

Therefore I want my YES-REPLACE patterns to be the ones above and I was thinking that my NO-DON'T-REPLACE pattern should be:

/www.chaindlk/i

which I think should take care of the problem, am I right about that?

128kb_com · Dec 17, 2002

I haven't tested this but:

"!(<a href=\"http://www.)?(chain[\s]?d.?l.?k.?)(.(com|net))?(<a/>)?!i"

marcnyc · Dec 17, 2002

afraid it doesn't work...

out of curiosity: are you saying you can also use exclamation marks instead of slashes?

Weedpacket · Dec 18, 2002

PCRE regular expressions allow beasties called "assertions" - they allow you to demand that your matched strings must (or must not) be followed (or preceded) by certain patterns, without actually including them in the match.

The assertion can be any regular expression, with the sole restriction that so-called "lookbehind" assertions (those that make statements about what may precede the match) can only be a fixed length (basically, no * or + or ?).

And yes, you can use exclamation marks instead of slashes to delimit your regexps - you can use pretty much any character in fact. Perl programmers found that it got a bit silly when they were matching strings that contained plenty of / characters if they had to escape them with \ as well (the result is known as "falling toothpick syndrome"). So they rewote the regexp engine so that they could start and end the regexp with any particular character of their choosing - usually one that doesn't appear in the pattern they're trying to match.

marcnyc · Dec 18, 2002

so are you saying it is possible or not? if yes can you give me an example so I can learn?? I know that you are probably trying to avoid to give me finished solution in order to stimulate my thinking but unfortunately there is an ocean of difference in knowledge of RegExp (and PHP for that matter) between me and you :-(

Weedpacket · Dec 18, 2002

Oh, yeah, forgot. Bit rushed right now.

For convenience's sake, instead of saying (chaindik|Chain dik|Chain d.i.k|etc.) I'm just going to say thingy.

Okay: match thingy so long as it doesn't appear inside an <a> entity. Well, a string is "inside" such an entity if, when reading forwards from it, an </a> is sighted before an <a or the end of the string (assuming that the string doesn't itself cut short in the middle of a link). This applies both to the <a> tag's attributes and to text between the <a> and </a> tags (you can't nest links). (It also applies to attributes of other tags in between, unfortunately - so if they contain the pattern we're looking for we're in trouble.)

That's inside though - we want outside. Basically, we want a pattern that will successfully match whenever it starts from a point outside a tag. In other words, we want to match any and all characters up to the next <a or the end of the string (whichever comes first), but only if it doesn't match </a> along the way.

To be brief, a pattern that does this is:

color=darkblue*(<a|$)[/color]

To dissect that:
color=darkblue[/color]
Is pretty self-explanatory - "<a or the end of the string".

color=darkblue[/color]
That's our assertion. Wherever we are right now, for a match to succeed, the next characters must not be </a>.

color=darkblue.[/color]
By preceding the usual "match any character" symbol by our assertion, we've ensured that we only match a character if it is not the start of an </a> tag. Just saying [^<] instead of . won't work, because < could quite happily appear in many other situations as well, and we're quite happy for it to do so.

color=darkblue*[/color]
Zero or more occurrences. This pattern will match any string up to the first </a> string it sees (or the end of the string, whichever comes first) and stop there. The </a> is not included as part of the match (an assertion only a condition the match must satisfy - not part of the match itself).

color=darkblue*(<a|$)[/color]
And of course we want to keep matching until we find an <a.

Well, that's our pattern for matching strings that are outside tags -
if we're going to match anything, it must also satisfy this condition. Let's assert it - positively, this time because this condition Must be satisfied.

(?=((?!</a>).)*(<a|$))

And what is it we're matching? I've been calling it thingy. Sticking in the regexp delimiters and a couple of appropriate modifier characters:

#(thingy)(?=((?!</a>).)*(<a|$))#is

And the bit of text that matched thingy (and that is actually the only bit that is matched, because everything else is wrapped up in that ?= assertion) will be in \1, ready for use in the replacement text of a preg_replace().

There are some (rather brief) notes on the subject of assertions and other bits of PCRE syntax in the manual. Very small examples, don't really give a flavour of what they're good for.

(And I notice vBulletin has seen fit to insert some spruious spaces in there.)

marcnyc · Dec 18, 2002

Hi WP, that's a good explanation! Thanks a lot... I should start saving all these threads for future reference (I wonder how long PHPBuilder keeps them online...)...

Your code works flawlessly in my case...

But I am left wondering how to write the syntax of a RegExp that matches n patterns but doesn't match m patterns, which was the initial question.

Your code solves my problem because the pattern I don't want to match happens to be what is in a url, but if you don't mind me asking (just cause I am eager to learn), what would I do if I want to match "foo" and "bar" but I don't want to match "marcnyc" and "weedpacket"?

My problem is solved so I have no rush with this question, just plain curious to understand and learn...

Weedpacket · Dec 19, 2002

Originally posted by marcnyc

Your code solves my problem because the pattern I don't want to match happens to be what is in a url, but if you don't mind me asking (just cause I am eager to learn), what would I do if I want to match "foo" and "bar" but I don't want to match "marcnyc" and "weedpacket"?

The gadget you're describing is the negative assertion. It can't be an ordinary match, of course, because the pattern you'd be describing must not appear in the string you're matching. The conventional gadget for matching strings that do not contain a pattern thingy is the bit that goes
code[/code]
That will match the entire string from the current position up to (but not including) the first occurrence of thingy. If you don't want to match the string at all if it contains thingy, then you'd just put the usual ^$ anchors around the expression:
^((?!thingy).)*$
In other words - this pattern only succeeds if thingy doesn't appear anywhere between the start and the end of the string. You can assert this as part of a more elaborate pattern:
^(?=((?!thingy).)*$)[i]rest of the pattern[/i]
Which says that, from the start of the string, the following pattern must not contain thingy at any point up until the end of the string, and then goes on to describe the pattern it is supposed to be matching; but in general it's easier to do this sort of thing in two steps - first ensure the string doesn't contain thingy, and only if it doesn't match the rest of the pattern.

marcnyc · Dec 19, 2002

Hi WP, thanks for the explanation. I'll have to go through it and learn... In the meantime I was wondering if I may address your attention towards the other thread we were discussing because the code we were using turned out not to work... Thanks!

Duey · Dec 19, 2002

why cant you just loop str_replace?

<?
$word_change = array();
$word_change = ("Chain D.L.K.","ChainD.L.K.","ChainDLK".,"Chain DLK","chaindlk.com","chaindlk.net");
$i=0;
for($i<sizeof($word_change); $i++){
str_replace("$word_change[$i]", "<a href=\"http://www.chaindlk.com\" target=\"_top\">$word_change[$i]</a>", $interview);
}
?>

im pretty sure that will work i dunno

marcnyc · Dec 19, 2002

Like I said, I can't just loop str_replace because if I do when the text actually contains "http://www.chaindlk.com" the result will be "http://www.<a href="http://www.chaindlk.com" target="_top">.com" becuase the word chaindlk inside the URL matches one of the patterns in your word_change array... Not to mention that to my knowledge you can't make the array case insensitive like with RegEx (in your case I would have to enter all possible combinations of uppercase and lowercase wordings of our site's name)...

RegExp: how to exclude a pattern from the search string

Mmarcnyc

Nno1uknowdfw

1128kb_com

Mmarcnyc

1128kb_com

Mmarcnyc

1128kb_com

Mmarcnyc

Weedpacket

Mmarcnyc

Weedpacket

Mmarcnyc

Weedpacket

Mmarcnyc

DDuey

Mmarcnyc