Ignore link in certain format - regex

pedroz

Hi regex masters!

Having this regex expression
/((http(s?):\/\/{1}\S+))/
it allows to fetch a link from a text string

 
$string = 'asdsa asdsa http://www.google.com asdasd';
// link to be fetched by the regex

$pattern = '/((http(s?):\\/\\/{1}\S+))/';
preg_match($pattern, $string, $match);

However, I would like to improve it in order to igonore the following link in $string = 'abc'

$string = '[abc](http://www.google.com "def")';
// link to be IGNORED by the regex

$pattern = '/((http(s?):\\/\\/{1}\S+))/';
preg_match($pattern, $string, $match);

Many thanks

sfullman

what you are asking for may be difficult, and you may need to do a secondary process
also I am not clear, what are you trying to match as a criterion to exclude? - for example is it a URL which is preceded by a parenthesis? If so that's easy:

$pattern='/[^)](http(s?):\\/\\/etc,etc/';

also I am thinking you will want to use preg_match_all() if there are multiple URL's present, yes?

again, excluding other than a single character or boundary is difficult in regexp.

sam

Weedpacket

sfullman wrote:
again, excluding other than a single character or boundary is difficult in regexp.

Unless you use negative assertions. With those you can say "match this unless it is followed by (or preceded by) this".

pedroz wrote:
However, I would like to improve it in order to igonore the following link

Wouldn't it be easier to allow the string to be matched, and then look at what was matched and discard it if it's of the "wrong" form? Of course, it depends on what that form actually is - something you haven't explained, as sfullman remarks. Why don't you want that url to match? Is it because it's in parentheses? Is it because it's in parenthesis along with another string? Is it because it follows the string [font=monospace][abc][/font]? Is it because it follows a string in square brackets? Is it because [font=monospace]abc[/font] is not the same string as [font=monospace]def[/font]?

sfullman

weed is right on all points and I did not know about assertions. However I still think it would be good tomatch everything and then exclude by a secondary process. I'm almost willing to bet that if you can come up with an exclude rule, I can find a way around it - if you're dealing with any latitude in user input at all - wich you usually are..

Weedpacket

It might also help to take one step back and explain how these strings come about in the first place. Perhaps something could be done earlier that would avoid having to do this in the first place.

sfullman

yep it would be very good if I were personally able to tell Amazon, eBay, Authorize.net, Google, Windows, BBEdit, even Linux, "change this and do it right in the first place and make my job easier" 😉

Weedpacket

sfullman wrote:
"change this and do it right in the first place and make my job easier"

That assumes they're doing it wrong 😉