How unique are the strings the following code produces gonna be?

substr(md5(md5(time())),0,10);

I hash the hash 'cause the line right before this one hashes time() as well and I want to make sure the two strings are not equivalent.

The way I see it, there are (26+10)10 (or 3,656,158,440,062,976 -- 3.6 QUADrillion) possibilities. Given that the site expects only about 10,000 users, I'd be pretty safe ensuring that each user has a unique (and pseudorandom) string associated with their account. Safe assumption?

My only worry is that the md5 algorithm may return two hashes that have the same 10 character substring to start. Any thoughts on that?

One work-around (kinda) for that would be to set the user's registration key to, say, 0000000000 once they have completed registration. That will help cut down the possibility of two users who have not completed the registration process having the same registration key at any given time.

    I hash the hash 'cause the line right before this one hashes time() as well and I want to make sure the two strings are not equivalent.

    Not much point, since the 2 original hashes should be very different.
    If they somehow are the same, then your additional hashes will also be the same.

    The way I see it, there are (26+10)10 (or 3,656,158,440,062,976 -- 3.6 QUADrillion) possibilities.

    Actually, there are 2128 possible hashes, which is greater than 3610.
    On the opther hand, the number of possible hashes is greatly reduced since you're hashing the output of time(), which doesnt vary very much.
    If the attacker knows the hour at which the hash was computed, he/she only has a few thousand pre-images to choose from.

    Actually, what are you trying to do?

      if the inner md5 returns the same result twice you will use the outer md5 for the very same data and therefor will end up with the very same result

      md5(<something>) = XYZ
      md5(<something_else>) = XYZ

      md5(md5(<something>)) = md5(XYZ) = ABC
      md(md5(<something_else>)) = md5(XYZ) = ABC

      got it?
      if you can do something like

      substr(md5(time().$id_of_newly_created_user), 0, 10)

      now you have a unique part even if time() is not unique

      btw: maybe [man]microtime[/man] is more suitable?

        The goal of all this is to ensure that users who begin the registration process cannot have their accounts hijacked by a third party who guesses a currently registered but non-validated account identifier.

        For instance.. say I sent the user a link via email to validate their email address and then finish up registering their account. They would receive a link such as www.domain.com/confirm.php?key=k7eh736sdg. Using something like confirm.php?userid=1345 would be considerably easier to predict as the user IDs are sequential. I realize that the problem is minor, but just trying to lock things down as much as I can.

        As for md5(x) and md5(md5(x)) returning the same thing, this I do not understand nor do I see evidence for this. I mean, a quick of md5("samplestring") and md5(md5("samplestring")) gives me "ba5759e55b83e28b84c717b95fd7bfd3" and "d75392ba3a6215f450ef12d4216c7acd" respectively. Perhaps I'm not getting something?

        And mrhappiness, per your suggestion, I've changed it to microtime().$user_email_address for the time being until a better way of doing this comes along. Thanks for the tip.

          Originally posted by Ravenous
          As for md5(x) and md5(md5(x)) returning the same thing, this I do not understand nor do I see evidence for this. I mean, a quick of md5("samplestring") and md5(md5("samplestring")) gives me "ba5759e55b83e28b84c717b95fd7bfd3" and "d75392ba3a6215f450ef12d4216c7acd" respectively. Perhaps I'm not getting something?

          you use md5 twice because you fear if you use it only once you might get two identical results, right?
          suppose you have two strings which have the very same md5 hash:
          md5("samplehash") = "ba5759e55b83e28b84c717b95fd7bfd3"
          if you have another string (e.g. samplehash2 which has the very same hash "ba5759e55b83e28b84c717b95fd7bfd3" you caclulate the md5 hash of "ba5759e55b83e28b84c717b95fd7bfd3" in both cases and of course will receive the same result twice
          so it isn't more secure to use md5 more than once

          And mrhappiness, per your suggestion, I've changed it to microtime().$user_email_address for the time being until a better way of doing this comes along. Thanks for the tip.

          assuming $user_mail_address is unique (you don't allow multiple accounts sharing the same mail adress, do you?) you have a unique part in the string you calculate the md5 hash from and that should be suitable

            If you want a unique ID, why not use [man]uniqid[/man] instead of replicating (parts of) that function?

            Originally written by laserlight
            On the opther hand, the number of possible hashes is greatly reduced since you're hashing the output of time(), which doesnt vary very much.

            And then taking only the first ten characters, thus reducing the number of possible hashes to only 2**40 = a shave over a trillion.

              Write a Reply...