Goal:
Create regex to capture ALL urls in a string.

The regex should find all URLS but end the match as soon as a whitespace OR a punctuation mark (?,.!'") followed by a whitespace character (i.e. \s in regex) is found.

Here's what I have:

([h|H][t|T][t|T][p|P][s|S]?:\/\/)([w]{0,3}[.]{0,1}[a-zA-Z0-9.-]+.[a-zA-Z0-9]{2,10}[\s])

It works for everything, the only problem is that it only ends when a white space it detected. I need it to end when a whitespace is detected OR a punctuation mark followed by a whitespace is detected.

Any help is greatly appreciated!

    Not sure about your whitespace issue, but you can make your regex case-insensitive if you're using preg_match(). That would simplify the pattern considerably. Also, not every URL starts with 'www.'. You need to account for (multiple) subdomains.

      Actually, the regex does take into account multiple subdomains, and does not require the domain to start with www. 🆒

      I'm actually using preg_replace(), does that allow for the same case insensitivity command (i)?

      Anyway, it's 99% correct and I know it's some stupid thing I'm missing. Can anyone help?

        Ah, OK. I see it now. You do indeed account for multiple subdomains and the optionality of 'www.' Instead of {0,1}, you can just use a question mark for better readability. Actually I don't think you need to specifically handle the 'www.' at all. Just treat it like any other subdomain and get rid of that part of the pattern.

        As far as preg_replace() and case insensitivity, yes. Just add an 'i' after the closing / of your pattern.

          Write a Reply...