Hey Everyone,

I'm trying to get a grasp on regular expressions, but for some reason they've always been really confusing. I need a regular expression that will strip out urls and replace them with nothing. Something that will strip out the following examples:

http://www.yahoo.com
www.yahoo.com
yahoo.com
123.com
www.123.com

or any other combinations I may have missed. Here's the regular expression I have thus far that does not work propertly. Any help would be greatly appreciated.

[\/:\w.]*(.com|.org|.net|.edu|.info|.us|.gov)

Thanks!
Lisa

    Try this pattern

    \b.*\.(com|org|net|edu|info|us|gov)\b 

    But what about yahoo.fr or www.123.be ? The list in the alternation will be loooooong!

    Edit: If you don't want to list all the domain extension codes, you could try

    \b.*\.[a-z]{2,3}\b

    But it will catch anything ending with a group of 2 or 3 alpha character. So it will also catch explorer.exe for example.

      Hi Lisa,

      if the domains are contained somewhere in the middle of a text, you should use ungreedy matching (.? instead of .) when trying ripat's patterns.

      Concerning the country-code TLDs (if you want to match them): imho they always have 2 letters, so

      .(com|org|net|edu|info|us|gov|[a-z]{2})

      may be an alternative.

      it would still not match yahoo.co.uk, but I have no idea if you acutally need this to be matched as well.

        Hey Guys!

        Thanks so much for all your help, I've been struggling with this for 3 days. 😛 All the examples that were given work perfectly in regex coach, the problem I'm running into is when I implement it in javascript. I know this is a PHP forum but sometimes I like to shake things up a bit. Hahah. Do you know if JS requires regular expressions to be in a set?

        Thanks again!!
        Lisa

          I stand corrected. I just tried my regex above (should have done it before really) and even with a ungreedy quantifier (.*?) it will not work if the url's are embedded in a text. As the dot will also match the space it will capture all the text preceding the first (com|edu|uk…..) met.

          How about this:
          B[/B]

          This should match any embedded url’s even if there is a new line character in the middle of it.

          Not sure about how it will work in JS regex flavor. Just try and let us know

            Write a Reply...