Given URLs of the form:

http://www.example.com
http://www.example.com/
http://www.example.net (any third-level domain, .net, .co.uk, etc.)
http://www.example.com/example.html?a=1&b=2
http://www2.example.com (any sub-domain)
http://example.com (no sub-domain)

I need some help with code that uses a Regular Expression to extract just the
second-level domain "example.com" from all forms of the full URL.

Code should take a URL as input and give second-level domain
"example.com" as output.

    If it were a matter of getting the last two words of the host separated by a period, this would be a fairly simple regex. But as you alluded to in your question, there are the two part tlds to contend with. And that list can be long.

    To cover every possibility you'd probably be better off getting the host from parse_url, exploding it on a period, checking the tld against an array and getting the second part of the domain as whatever comes before the tld.

      This pattern is untested, but if it doesn't work maybe you can fudge with it till it does.

      preg_match("/https?:\/\/[^\.]?\.?(\w+\.[a-z]{2,3})\/?/i", $url, $matches);
      $sec_level_domain = $matches[1];
      

      Reading the post above, I have to say I agree. I would use the parse_url function to break the URLs into smaller chunks of which more reliable patterns can be surmised.

      Also, if you try the code above, beware that my forward slash escapes do not print in the browser.

        got help from http://www.eliteskills.com/ who came up with...

        <?

        $url = "http://about.wipro.com.au/index.asp?a=bg.v&bs";

        echo "In: $url";

        $url=preg_replace("/((http(s)?|ftp):\/\/)/", "", $url);
        $url=preg_replace("/([\/]+)(.*)/", "\1", $url);

        $urlcount = explode(".",$url);
        $urlcount1 = count($urlcount);
        $urlcount1--;

        if (ereg("(.co.|.ca.|.com.|.org.|.net.)", $url)){
        $urlcount1--;
        }

        $url=preg_replace("/([.]+)./i", "", $url,$urlcount1-1);

        echo "<br />Out: $url";

        ?>

        only problem is with the part where I have to put in the tlds to look for. I would like to skip that step, so I can allow for ANY tld.

          Write a Reply...