md5 passwords

sneakyimp · Nov 11, 2012

More food for thought on password strength:
http://xkcd.com/936/

Weedpacket wrote:
But still, I'm going off to write a password entropy calculator now.

Exciting! Please do let us know when you post.

To NZ_Kiwis: As bradgrafelman said, [man]md5[/man] and other hash functions are one-way. That's why they use the word "hash". When a chef cooks a hash, you can never get the original ingredients back. In the case of MD5, it has been determined that the algorithm tends to produce the same output for very distinct values (i.e., not "collision-resistant") and so its usefulness as an encryption tool is limited. In the case of your database, you will never be able to know exactly what the original password is because there is no way to reverse the process. You might, however, download a torrent file that has some matching password for each and every hashed password in your database. That is to say you might not find the user's original password, but you could find some other password that gives the exact same output when hashed using MD5.

My knowledge here is a bit thin, but the idea of using salted passwords means that you combine the password with some random string of characters before storing the hash and this would ostensibly protect your data system from hack attempts because even if your users used very simple passwords, you would be combining them with some random string before hashing.

As for your question "can I just copy the old md5 hashes to my new table" yes of course you can do that. The question I think you mean to ask is "will my users still be able to log in?" That depends on whether you deal with this new table in the same way. If the new table is accompanied by a new way of hashing passwords, then you probably won't have much luck in just copying hashed passwords from one database to another. You'd need to look into the details of how the passwords are handled in the new system.

Bonesnap · Nov 14, 2012

I've been pretty interested in the whole password/hashing/salting/storing/cracking thing for a while now, and have even created my own amateur hash dictionary (pretty small right now, only about ~306 million entries). I consider it a hobby; I really like following all the security breaches and the companies that are leaked.

I usually stick with SHA-512 as I see no reason to use anything "smaller" (like SHA-256).

Something I'd like to comment about salting though: While the video posted showed it was very easy to crack the passwords even though they were salted, the cracker has to first know how the salt is being applied, as Weedpacket alluded to. Unless the cracker has access to your source code, he may never crack your password even if it's password. While source code has been leaked before, more often than not it's the database that gets breached and therefore the attacker really only has the salt and the hash. I make my salt as long as the hash's output, so in the case of SHA-512, 128 characters. From there I can do a number of things. The most common thing is to either append the salt to the beginning or end of the password and then hash that. But what about splitting the salt in half and appending one half to the beginning and one to the end? Or using a reverse string function on it and appending to the beginning or end? The ideas are endless. I usually do something obscure like that, so even if my database is leaked there's very little chance the attacker is going to know how the salt is applied.

This of course doesn't mean you can choose weak passwords; a strong password is always your best bet for staying secure, but a nice salting method can go a long way if your database is leaked.

sneakyimp · Nov 14, 2012

Bonesnap;11017401 wrote:
I've been pretty interested in the whole password/hashing/salting/storing/cracking thing for a while now, and have even created my own amateur hash dictionary (pretty small right now, only about ~306 million entries). I consider it a hobby; I really like following all the security breaches and the companies that are leaked.

If you have an algorithm you are using to generate the dictionary randomly, I'd be interested in knowing how often you encounter a collision in expanding your DB. It's my understanding that MD5 producese a 128-bit hash which means there are 3.4 10³⁸ different possibilities -- vastly larger than the 3.06 10⁸ entries that you have. I also wonder if there's any theory out there about what algorithms are able to explore MD5 space faster than others.

Bonesnap;11017401 wrote:
Something I'd like to comment about salting though: While the video posted showed it was very easy to crack the passwords even though they were salted, the cracker has to first know how the salt is being applied, as Weedpacket alluded to. Unless the cracker has access to your source code, he may never crack your password even if it's password. While source code has been leaked before, more often than not it's the database that gets breached and therefore the attacker really only has the salt and the hash.

If your password is "password" then you will be cracked eventually by a dictionary attack. I'd be willing to bet that any exploit kit would start first with a set of common default passwords. As for depending on your source code to protect you, I think it's important to consider Kerckhoff's Principle which suggests that the keys you use matter tremendously more than the mechanism of encryption. Considering for a moment the number of different ways that one might prepend or append or intersperse a salt with a password, it seems apparent that we're talking about a much smaller number of possibilities than the vast number space of a 128-bit key.

Derokorian · Nov 14, 2012

sneakyimp;11017423 wrote:
It's my understanding that MD5 producese a 128-bit hash

Hey sneaky, you should check out this comment on php.net it shows md5's length as 32 not 128 (hence it being discouraged for secure usage).

Edit: As for dictionary attacks, that will only work if they are using your login mechanism to spam logins, however, good security would say If request for invalid credentials happen more than X times within Y minutes from a single source, block that source from continued attempts for Z minutes (where z increases every time a single source is added to the list). Of course that's my style, it may not be everyone's.

sneakyimp · Nov 14, 2012

According to both the wikipedia page on MD5 and this RFC, MD5 is a 128 bit message digest. The 32-character string you refer to has a hexadecimal digit for each character -- IIRC, each individual hex digit represents 4 bits (2⁴ = 16). A 32-digit hexadecimal number is therefore able to represent 128 bits (32 * 4 = 128).

I believe the issue with MD5 is that, although the digest output of 128 bits represents a very large number space, the algorithm itself is flawed in that it doesn't really utilize the vastness of that number space. I.e., the algorithm is not "collision-resistant" and vastly different inputs result in outputs that occupy only a small fraction of the available number space. This makes it easy to find some substitute for a given password that has the exact same MD5 hash. Salting a password mitigates the risk associated with this substitution in a way that I don't fully understand.

You are correct that a dictionary attack must spam logins, but any brute force approach must also spam logins. Your approach wherein you ban repeated login failures is an effective countermeasure to login spamming. I typically install fail2ban on my servers to prevent such spamming via ssh. Still, I'd be willing to bet that if your password is password that any exploit kit worth its salt would try that password in the first 20 attempts. Salt it however you like, it's going to be one of the first passwords tried if one must resort to brute force.

dalecosp · Nov 14, 2012

sneakyimp;11017433 wrote:
I typically install fail2ban on my servers to prevent such spamming via ssh.

Much better to disallow password authentication entirely and utilize key-based authentication, yes?

sneakyimp · Nov 14, 2012

dalecosp;11017441 wrote:
Much better to disallow password authentication entirely and utilize key-based authentication, yes?

Yes, that has been my approach where practicable -- with fail2ban and iptables on top for extra OCD-ness.

Bonesnap · Nov 14, 2012

sneakyimp;11017423 wrote:
If you have an algorithm you are using to generate the dictionary randomly, I'd be interested in knowing how often you encounter a collision in expanding your DB. It's my understanding that MD5 producese a 128-bit hash which means there are 3.4 10³⁸ different possibilities -- vastly larger than the 3.06 10⁸ entries that you have. I also wonder if there's any theory out there about what algorithms are able to explore MD5 space faster than others.

I don't have an algorithm but I do have a method. I have spent considerable time compiling massive dictionaries ranging from English language to common passwords to celebrities and apply varying incremental "mangling" to each entry. Also, I have never encountered a collision. 306 million is quite literally a drop in the bucket compared to the total combinations. Crackstation has a dictionary of over 15 billion entries and from what I understand has never run into a collision.

sneakyimp;11017423 wrote:
If your password is "password" then you will be cracked eventually by a dictionary attack. I'd be willing to bet that any exploit kit would start first with a set of common default passwords.

Dictionary attacks are almost useless against a salted password, the video notwithstanding. The only reason the video looks successful is because the attacker knew exactly how the salt was applied. I don't remember, were all the salts unique from that video? Also, all those passwords were very simple. If I recall, most of them were all digits or very common words. If you applied those passwords against a larger dictionary the time would increase substantially.

You are right, though: using a very simple password, even if salted, is much more likely to be cracked than a strong one (also salted).

sneakyimp;11017423 wrote:
As for depending on your source code to protect you, I think it's important to consider Kerckhoff's Principle which suggests that the keys you use matter tremendously more than the mechanism of encryption. Considering for a moment the number of different ways that one might prepend or append or intersperse a salt with a password, it seems apparent that we're talking about a much smaller number of possibilities than the vast number space of a 128-bit key.

I'm not advocating using an insecure password; however, applying a salt increases its effectiveness quite a bit, and applying a non-standard method could make it nearly impossible. Remember, password attackers, like common thieves, go for the easy targets and avoid the difficult ones. Most won't even attempt a salted password because it's not worth it, and instead go for the unsalted ones.

Derokorian;11017425 wrote:
Edit: As for dictionary attacks, that will only work if they are using your login mechanism to spam logins, however, good security would say If request for invalid credentials happen more than X times within Y minutes from a single source, block that source from continued attempts for Z minutes (where z increases every time a single source is added to the list). Of course that's my style, it may not be everyone's.

Brute force dictionary attacks are usually applied to a leaked database, where the hashes can be sorted and unique entries removed. Such being the case, there are no arbitrary limits imposed on the attacker. In reality, no serious attacker tries to brute force through a website login form, even if there weren't any restrictions; it would simply take too long.

Derokorian · Nov 14, 2012

Interesting points of information bone. As per usual I don't know the method for attacks, and made assumptions. I guess I don't understand how you can use a dictionary attack on the database unless you know the salt(s), the way they are applied and all intermediate steps between inputting a password in a form, and storing it in the db. For example, is it as simple as:

$pass = md5(APPSALT . $_POST['password'] . $row['USER_SALT']);

Or is it possibly more complex, like one I used to use:

$pass = hashPass($username,$usersalt,$_POST['pass']);
function hashPass($user,$salt,$pass) {
   $str = substr(APPSALT,0,ceil(strlen(APPSALT)/2)) . $salt . $user . $pass . $salt . substr(APPSALT,ceil(strlen(APPSALT)/2));
   return hash("sha512",hash("sha512",$str));
}

Seems to me, without any of those inner workings there shouldn't be anyway to use a dictionary attack to crack a hashed password... But maybe I'm missing something (as always) about how it works.

Bonesnap · Nov 14, 2012

The method I would use would be closer to that of your second example.

Derokorian;11017461 wrote:
Seems to me, without any of those inner workings there shouldn't be anyway to use a dictionary attack to crack a hashed password

This only applies if the password is salted. If it's not then a dictionary attack (a good one) can be very effective.

One other thing about salting: it severely mitigates the effectiveness of rainbow tables as well, almost nullifying them completely.

sneakyimp · Nov 14, 2012

Bonesnap;11017457 wrote:
I have spent considerable time compiling massive dictionaries ranging from English language to common passwords to celebrities and apply varying incremental "mangling" to each entry.

I wonder if it might be possible to use a simpler algorithm to generate an MD5 dictionary more rapidly simple by taking the 93 printable ASCII characters and generating strings of some length between 8 and 20 chars by just counting from 0 to N in base 93. After all, the point isn't really to determine some original password, but to determine an equivalent string that accomplishes the same hash, right? Or is it? I think the salting algorithm might matter here. I wonder if salting might obviate the utility of an md5 dictionary entirely.

Bonesnap;11017457 wrote:
Dictionary attacks are almost useless against a salted password, the video notwithstanding.

Not if you apply the dictionary attack at the point of login, in which case the app is applying the salt for you.

Bonesnap;11017457 wrote:
applying a non-standard method could make it nearly impossible.

I think Kerckhoff would disagree I'm not sure how to formalize mathematically how much trickier non-standard salting methods are, but I would posit that each possible different salting technique contributes at most one bit to the strength of one's password [i.e., it provides exactly one alternative to the simple prepend or append approach] -- and then only if the source code is not also compromised.

Also, if you are applying the attack at the point of login, the salting method is irrelevant. The point of both hashing and salting is merely to make it more difficult to extract the actual password from stored data that has been exposed.

Bonesnap;11017457 wrote:
Brute force dictionary attacks are usually applied to a leaked database, where the hashes can be sorted and unique entries removed. Such being the case, there are no arbitrary limits imposed on the attacker. In reality, no serious attacker tries to brute force through a website login form, even if there weren't any restrictions; it would simply take too long.

I don't have any hacker cred to know under what conditions people most often use dictionary attacks, but I've seen long-running and persistent brute force attacks directed against ssh on a dedicated server I had once -- the basic approach appeared to be attempting login using common usernames like root, ubuntu, mysql, www-data, etc. With a botnet and/or multithreaded applications, it is possible to bring quite a bit of cracking power to bear on an unprepared server and the extra time lost in HTTP latency might well be offset by the extra time required to deal with some kind of black box salting algorithm.

I think the critical advantage of a dictionary attack -- regardless of where it is applied -- is that most passwords are not nonsense but are directly derived from one or two words in the dictionary. For an 8-letter password, this means a dramatic reduction in the number of possible passwords a victim might reasonably use. There are 5.6 * 10 ^ 15 different random 8-char sequences of the 93 printable ASCII characters. I don't know the number of letters in English with 8 letters or less but it's maybe 10 ^5. Allowing 100 variations per word for common substitutions (e.g., CamelCase or L3tt3r Substitution or punctuation) still yields only about 10 ^ 7 possibilities -- a dramatic difference.

Weedpacket · Nov 15, 2012

Bonesnap wrote:
Also, I have never encountered a collision.

A mechanism for constructing MD5 collisions was presented in 2005; however it does involve the attacker creating both plaintexts.

Bonesnap wrote:
306 million is quite literally a drop in the bucket compared to the total combinations. Crackstation has a dictionary of over 15 billion entries and from what I understand has never run into a collision.

(hehe, he said "literally a drop the bucket"). The attack shown was against a known salting method (ASP.NET's Membership provider), and in the video tested hashes at a rate of 259 million/second (it could have been a couple of billion tests per second but for bottlenecks in generating cases to pass to the GPU) for 45 minutes and recovered 24710 of 39384 passwords (with unique salts). Mind you, the dictionary only had 23,685,601 entries.

sneakyimp wrote:
I wonder if it might be possible to use a simpler algorithm to generate an MD5 dictionary more rapidly simple by taking the 93 printable ASCII characters and generating strings of some length between 8 and 20 chars by just counting from 0 to N in base 93. After all, the point isn't really to determine some original password, but to determine an equivalent string that accomplishes the same hash, right?

There's no reason to assume that a collision has the same or even a remotely similar length to the target. (In fact, I'm not aware of a proof that any given hash does have (at least) two colliding plaintexts, though I'd be surprised if it's not so - it would imply that the MD5 algorithm is pretty poor at uniformly distributing plaintexts across the hash space.) Mind you, your "just counting" is equivalent to a dictionary containing 2.36785×10³⁹ words - or about 4×10²⁸ terabytes.

sneakyimp wrote:
I think the critical advantage of a dictionary attack -- regardless of where it is applied -- is that most passwords are not nonsense but are directly derived from one or two words in the dictionary. For an 8-letter password, this means a dramatic reduction in the number of possible passwords a victim might reasonably use. There are 5.6 * 10 ^ 15 different random 8-char sequences of the 93 printable ASCII characters. I don't know the number of letters in English with 8 letters or less but it's maybe 10 ^5. Allowing 100 variations per word for common substitutions (e.g., CamelCase or L3tt3r Substitution or punctuation) still yields only about 10 ^ 7 possibilities -- a dramatic difference.

Obligatory xkcd reference here

Bonesnap · Nov 15, 2012

sneakyimp;11017465 wrote:
I wonder if it might be possible to use a simpler algorithm to generate an MD5 dictionary more rapidly simple by taking the 93 printable ASCII characters and generating strings of some length between 8 and 20 chars by just counting from 0 to N in base 93. After all, the point isn't really to determine some original password, but to determine an equivalent string that accomplishes the same hash, right? Or is it? I think the salting algorithm might matter here. I wonder if salting might obviate the utility of an md5 dictionary entirely.

What you're referring to is a rainbow table, and they already exist for all possible combinations of ASCII characters up to eight characters for MD5. If you look hard enough on the Internet you can find them. I have.

The issue with rainbow tables is simply that passwords that people use don't "fit the bill", as sneakyimp basically said. So while you will have a massive dictionary of precomputed hashes, your success rate will be far less than a carefully constructed dictionary attack.

sneakyimp;11017465 wrote:
Not if you apply the dictionary attack at the point of login, in which case the app is applying the salt for you.

Agreed, but simply put: it's too slow. When you compare the amount of time it takes even a script to enter in credentials compared to attacking a hash list, it's night and day. Even if it took half a second, there's millions of combinations "wasted" - and that's assuming that the login form doesn't have security measures in place. Most do nowadays. They're really not a target for password attackers.

sneakyimp;11017465 wrote:
I think Kerckhoff would disagree I'm not sure how to formalize mathematically how much trickier non-standard salting methods are, but I would posit that each possible different salting technique contributes at most one bit to the strength of one's password [i.e., it provides exactly one alternative to the simple prepend or append approach] -- and then only if the source code is not also compromised.

He can disagree all he wants but the fact remains if you don't know how the salt is applied then you're out of luck. Your only two options are to either 1) keep trying different salting methods until you find one that works but that presents a problem: what if you actually use the right salting method but the string you're testing isn't in your dictionary? You would think you had the wrong one and move on. Sure, you could test each and every single hash, but that exponentially adds to your time. Why not just attack someone who puts up much less of a fight?

And 2) Do a straight up brute force attack trying every single possible combination, but we both know that could take centuries or more.

Nothing in the world is absolutely 100% secure; everything has a flaw. So let's remember the gritty truth about security: you don't have to be 100% secure, but you better hope to be more secure than the guy standing next to you.

EDIT: Yes, if your source code is compromised then there's nothing you can do. If you drop your house key and someone picks it up, then your fancy lock means nothing. One thing to keep in mind though: as the "designer" of the system, I don't have control over people's passwords (short of putting restrictions on them but that's generally considered a bad idea), so I will make my system as strong as possible. I still think it is better to have a non-standard salting method. Wouldn't you agree?

sneakyimp;11017465 wrote:
I don't have any hacker cred to know under what conditions people most often use dictionary attacks, but I've seen long-running and persistent brute force attacks directed against ssh on a dedicated server I had once -- the basic approach appeared to be attempting login using common usernames like root, ubuntu, mysql, www-data, etc. With a botnet and/or multithreaded applications, it is possible to bring quite a bit of cracking power to bear on an unprepared server and the extra time lost in HTTP latency might well be offset by the extra time required to deal with some kind of black box salting algorithm.

In my (limited) experience and from what I've read, the overwhelming majority of dictionary attacks and brute force attacks are on a hash list or some other kind of list that the attacker can manipulate.

EDIT: By limited experience I do not mean I have hacked, cracked, or otherwise tried to gain access to a system or someone's account without permission. Just wanted to clarify that!

sneakyimp;11017465 wrote:
I think the critical advantage of a dictionary attack -- regardless of where it is applied -- is that most passwords are not nonsense but are directly derived from one or two words in the dictionary. For an 8-letter password, this means a dramatic reduction in the number of possible passwords a victim might reasonably use. There are 5.6 * 10 ^ 15 different random 8-char sequences of the 93 printable ASCII characters. I don't know the number of letters in English with 8 letters or less but it's maybe 10 ^5. Allowing 100 variations per word for common substitutions (e.g., CamelCase or L3tt3r Substitution or punctuation) still yields only about 10 ^ 7 possibilities -- a dramatic difference.

Agreed, which is why dictionary attacks are far more likely to succeed versus a rainbow table.

Weedpacket;11017471 wrote:
A mechanism for constructing MD5 collisions was presented in 2005; however it does involve the attacker creating both plaintexts.

I'm not denying collisions exist or that it's possible, I was just saying that I have personally never encountered one during the creation of my dictionary (or otherwise).

Weedpacket;11017471 wrote:
(hehe, he said "literally a drop the bucket"). The attack shown was against a known salting method (ASP.NET's Membership provider), and in the video tested hashes at a rate of 259 million/second (it could have been a couple of billion tests per second but for bottlenecks in generating cases to pass to the GPU) for 45 minutes and recovered 24710 of 39384 passwords (with unique salts). Mind you, the dictionary only had 23,685,601 entries.

I don't quite understand what you're getting at here. 306 million is a fraction of a fraction of a percent of the total possible combinations of MD5, and even less for SHA-1.

Also, you said it right there. A known salting method. That's the weakness.

And to be honest, 259 million/second isn't impressive. My dual 8800 ULTRAs do ~950 million/second when I use IGHashGPU. Even a few billion is nothing nowadays. You should check out a Whitepixel. They were able to obtain 33.1 billion/second with 4 GPUs running together. Talk about horsepower, eh?

Weedpacket · Nov 15, 2012

Bonesnap wrote:
I don't quite understand what you're getting at here. 306 million is a fraction of a fraction of a percent of the total possible combinations of MD5, and even less for SHA-1.

http://theoatmeal.com/comics/literally

Also, you said it right there. A known salting method. That's the weakness.

Which, if you use a third- or second-party authentication system, or the choice of hash is otherwise outside your control, is likely to be the case (see the list of algorithms that the cited software targets).

http://board.phpbuilder.com/showthread.php?10386947-md5-passwords&p=11016865&viewfull=1#post11016865

Bonesnap · Nov 15, 2012

Weedpacket;11017493 wrote:
http://theoatmeal.com/comics/literally

Heh, touche.

Weedpacket;11017493 wrote:
Which, if you use a third- or second-party authentication system, or the choice of hash is otherwise outside your control, is likely to be the case (see the list of algorithms that the cited software targets).

http://board.phpbuilder.com/showthread.php?10386947-md5-passwords&p=11016865&viewfull=1#post11016865

Fair enough. I still advocate choosing a strong password as well.

Bonesnap · Nov 15, 2012

Double post. Thanks, Internet!

dalecosp · Nov 15, 2012

sneakyimp;11017447 wrote:
Yes, that has been my approach where practicable -- with fail2ban and iptables on top for extra OCD-ness.

Ah, I understand that. I tend to tcpwrap mine, warnings in the config_file be d-----d. So far, been a long time since I got bit by that.

md5 passwords

Ssneakyimp

BBonesnap

Ssneakyimp

DDerokorian

Ssneakyimp

dalecosp

Ssneakyimp

BBonesnap

DDerokorian

BBonesnap

Ssneakyimp

Weedpacket

BBonesnap

Weedpacket

BBonesnap

BBonesnap

dalecosp