Hi guys,

After a lot of messing around with parse_url -- i found a publicsuffix.org which offers a list of all the top level and second level domain names.

I'm trying to make some code that will return "phpbuilder" when given phpbuilder.com or phpbuilder.co.uk or phpbuilder.com/page/page

So i'm using parse_url then i'm going to check the tld list on publicsuffix.org to see to get the site name.

I'm going to conver the list into a mysql database:

#ID
TLD

What would be the best way to search the database when i am trying to do checks like this?

Find "phpbuilder.com" and see if ".com" is in the database???

Thanks for any help. 🙂

    Simplest ways, possibly somewhat flawed...
    Store whtever parts you need from the lower levels, phpbuilder in this case, in one table, and all tlds in one table. Then one link table to join stuff together.
    So, assuming phpbuilder exist in combination both with topdomain .com and org, you'd get two results for

    SELECT tld FROM tld
    INNER JOIN ll_to_tld ON ll_to_tld.tld_id = tld.id
    INNER JOIN ll ON ll.id = ll_to_tld.ll_id
    WHERE ll.domain_part = 'phpbuilder'
    

    Then you will have to decide on how you should handle something like phpbuilder.co.uk. Should you still store just phpbuilder, or should you store phpbuilder.co? Or should you somehow set up your schema to deal with hierarchical data for real? This article may be of some use in that case

      Thanks for the reply johanafm, but i think i explained myself wrong 🙁

      I'm going to have a database containing the tlds:

      #ID	TLD
      1	.com
      2	.co.uk
      3	.org.uk
      4	.us
      5	.me
      6	.tv
      etc...
      

      Then what i really need is to be able to search the database - I will have a string (a domain) from the user - example: http://www.phpbuilder.com/board/

      I will use parse_url to get the domain down to "phpbuilder.com" then using the database of tld - i would like to match the ".com" bit's so i know that the site name is actually just "phpbuilder"

      If there were not so many tld's, the way i would think to do it is this:

      $site = 'http://www.phpbuilder.com/board/'; // $_get['site'];
      $site = parse_url[$site];
      $pos = strpos($site, '.com');
      echo substr($site, 0, strlen($site)-$pos);
      

      But as there are loads of '.com' '.co.uk' etc -- the above code is not possible.

      Thanks for any help again everyone!

        Ahh, I see. Just turn the ordinary use of LIKE around

        WHERE 'phpbuilder.com' LIKE CONCAT('%',tld)
        

          Thanks again john 🙂

          That is almost working perfectly - except the order of the returned results is not right 🙁

          The query i'm using is:

          SELECT tld AS bob
          FROM `tbl_tld` 
          WHERE 'phpbuilder[B].co.uk'[/B] LIKE CONCAT( '%', tld ) 
          ORDER BY bob DESC
          LIMIT 5
          

          If the text above in bold is '.co.uk'

          The results would be:

          1: .co.uk
          2: .uk

          -- This is right!

          BUT... if the text in bold is 'org.ae', the result would be:

          1: .ae
          2: .org.ae

          -- Which is the wrong order.

          Using ASC or DESC will just swap the order round, so one is always wrong or right.

          Can anyone help how to order CONCAT/LIKE so the closest results come first? ' .org.ae' sound match better to 'phpbuilder.org.ae' then '.ae'

          Thanks again.

            Strange thing i see is - even though the results are ordered wrong (either ASC or DESC), phpmyadmin seems to see which is the best result, as it has a small orange boarder around it.

            So phpmyadmin can see which is the best result, how do i get that one to the top?

              Well, I do not use phpmyadmin, and I fail to see how it would know what a "best result" ever is for any generic query.

              SELECT sexual_preferences FROM partner

              Now, what can be considered as a best result?

              As for your specific query, both of your examples are ordered in the same (ASC) order, not DESC as your code suggests or in different ways as you claim.

              For the first example: .co.uk vs .uk, you need to look at two characters from each string to decide ordering. Here in ascending order
              .c
              .u

              For the second example, you once again need look only at the two first characters. Once again in ascending order:
              .a
              .o

              Which leads me to believe you do not want either ASC or DESC ordering, but some kind of custom ordering.

                Write a Reply...