Regular Expression Help

ljschrenk · Jan 31, 2005

Hey Everyone,

I'm trying to get a grasp on regular expressions, but for some reason they've always been really confusing. I need a regular expression that will strip out urls and replace them with nothing. Something that will strip out the following examples:

http://www.yahoo.com
www.yahoo.com
yahoo.com
123.com
www.123.com

or any other combinations I may have missed. Here's the regular expression I have thus far that does not work propertly. Any help would be greatly appreciated.

[\/:\w.]*(.com|.org|.net|.edu|.info|.us|.gov)

Thanks!
Lisa

ripat · Feb 1, 2005

Try this pattern

\b.*\.(com|org|net|edu|info|us|gov)\b

But what about yahoo.fr or www.123.be ? The list in the alternation will be loooooong!

Edit: If you don't want to list all the domain extension codes, you could try

\b.*\.[a-z]{2,3}\b

But it will catch anything ending with a group of 2 or 3 alpha character. So it will also catch explorer.exe for example.

xblue · Feb 1, 2005

Hi Lisa,

if the domains are contained somewhere in the middle of a text, you should use ungreedy matching (.? instead of .) when trying ripat's patterns.

Concerning the country-code TLDs (if you want to match them): imho they always have 2 letters, so

.(com|org|net|edu|info|us|gov|[a-z]{2})

may be an alternative.

it would still not match yahoo.co.uk, but I have no idea if you acutally need this to be matched as well.

ljschrenk · Feb 1, 2005

Hey Guys!

Thanks so much for all your help, I've been struggling with this for 3 days. All the examples that were given work perfectly in regex coach, the problem I'm running into is when I implement it in javascript. I know this is a PHP forum but sometimes I like to shake things up a bit. Hahah. Do you know if JS requires regular expressions to be in a set?

Thanks again!!
Lisa

ripat · Feb 1, 2005

I stand corrected. I just tried my regex above (should have done it before really) and even with a ungreedy quantifier (.*?) it will not work if the url's are embedded in a text. As the dot will also match the space it will capture all the text preceding the first (com|edu|uk…..) met.

How about this:
B[/B]

This should match any embedded url’s even if there is a new line character in the middle of it.

Not sure about how it will work in JS regex flavor. Just try and let us know

Regular Expression Help

Lljschrenk

Rripat

Xxblue

Lljschrenk

Rripat