So I'm revamping two websites that require user registration and login and I've been reading a bit about OpenID and OAuth as these systems are used by most of the big guys like google and twitter. and I am thinking that they are pretty elaborate systems intended to delegate access to one's Web Application to other clients/web applications / whatever without requiring users to re-enter all their details. OpenID's description on Wikipedia says:

OpenID (OID) is an open standard and decentralized protocol by the non-profit OpenID Foundation that allows users to be authenticated by certain co-operating sites (known as Relying Parties or RP) using a third party service. This eliminates the need for webmasters to provide their own ad hoc systems and allowing users to consolidate their digital identities. In other words, users can log into multiple unrelated websites without having to register with their information over and over again

OAuth's entry says:

OAuth is an open standard to authorization. OAuth provides client applications a 'secure delegated access' to server resources on behalf of a resource owner. It specifies a process for resource owners to authorize third-party access to their server resources without sharing their credentials. Designed specifically to work with Hypertext Transfer Protocol (HTTP), OAuth essentially allows access tokens to be issued to third-party clients by an authorization server, with the approval of the resource owner, or end-user. The client then uses the access token to access the protected resources hosted by the resource server. OAuth is commonly used as a way for web surfers to log into third party web sites using their Google, Facebook or Twitter accounts, without worrying about their access credentials being compromised.

All this talk about third-party access and unrelated websites makes me think that these authentication protocols are either overkill or just poorly suited to the websites/applications that I'm working on. On the other hand, I feel like these are very good things to know about and we might also be positioning our web apps as some kind of API in the future.

The more I read, the more these technologies seem like a way for the big companies (Google, Twitter, etc.) to track what you use your ID for. For that reason, I dislike the idea of allowing users to use their Google/Twitter/Other accounts to connect to my site as it seems like privacy is compromised. Conversely, it seems unlikely that I need to go through the trouble of establishing some elaborate OAuth or OpenID scheme with my app at the center because I'm not a big player such that people would want to use my account to login elsewhere. Furthermore, I have a largish table of existing users each with their own username/email/password and I'd like to continue having them login with these credentials without interruption.

I suppose my ultimate question is what is a state-of-the-art means of authentication in a PHP-driven website?. Any details about roll-your-own systems you folks may have are most welcome. E.g., if you use [man]password_hash[/man], what algorithm do you specify? Does anyone ever use the default hash algo? Does anyone provide their own options to specify a cost and/or salt?

Implementations also quite welcome!

    We use SimpleSAMLphp at work for integrating our site into clients' sites, which I recommend for that particular use case, but may not be what you need for a more "normal" authentication purpose.

      NogDog;11044477 wrote:

      We use SimpleSAMLphp at work for integrating our site into clients' sites, which I recommend for that particular use case, but may not be what you need for a more "normal" authentication purpose.

      OK took a look at that link and it seems to me that SAML also involves three parties:
      1) principal - i.e., user who wants to do something
      2) service provider - i.e., website that offers a service like twitter
      3) identity provider - i.e., some third party that is able to authenticate the principal.

      I'm having this vague inkling that my organization might want to be both service provider and identity provider if we are planning to offer an API so that we can easily expand access to our system by letting our principals delegate access to their account to phone apps or web apps or some other such thing, but then I lose my train of thought over the whole thing.

      What do you mean by "integrating our site into clients' sites?"

        The application I work on is provided for use by our clients, either via API calls or by a GUI they can load into an IFrame. So the user needs to be logged into their site before they can use any feature on our site that inserts/updates data. Therefore, if such a request comes through and it does not include the requisite SAML assertion that indicates they're currently logged in, then they'll get redirected to the login URL specified for that client's SAML configuration (and hopefully it then successfully sends them back to us 🙂 ). I'm not a real expert on the details and fine print, as I mostly deal with it at the application level, writing interceptors to detect if we have the necessary credentials or not to proceed.

        If you want to provide a SSO (single sign-on) implementation for a set of discrete applications, I think SAML could be an option as a way to glue them together, perhaps? Not sure it's the best way in all cases, but it's good for apps that can't know about each others inner workings.

          I'm starting to sense two big things:
          1) If we want people to be able to use Facebook, Google, Twitter, etc. to access our application server rather than registering with us, then we'll need to probably understand OAuth and establish some functionality to accept OAuth credentials/keys. This seems worrisome to me because of privacy concerns, but may in fact encourage more use of our system. I also expect we'd have to continue to maintain our own user/pass authentication scheme lest we alienate our large number of existing customers who originaly registered with us.

          2) If we want our users to be able to create an API system where users can delegate access to their account to various clients (cell phones, other web apps, etc) then we'll probably need to adopt one of these schemes in order to allow reasonable management of a stable of clients because our users should be able to grant & revoke access to their account for each client without having to change their basic account credentials.

          I don't understand either system very well and hope to dig in more. Sadly, I'm in the middle of some serious learning curves for other stuff at the moment.

            Just thought I'd toss in here the point that part of the reason for the prevalence of third parties in these protocols is so that identity authentication can work both ways: how does the user know that the entity claiming to be you currently asking for their login details really is you? Do they have more than just your word that you really are who you say you are?

              7 days later
              Weedpacket;11044491 wrote:

              Just thought I'd toss in here the point that part of the reason for the prevalence of third parties in these protocols is so that identity authentication can work both ways: how does the user know that the entity claiming to be you currently asking for their login details really is you? Do they have more than just your word that you really are who you say you are?

              That's a good point, but the more I think about this, the less that seems helpful in a security sense for a few reasons:
              Is Google or some other oAuth kingpin playing watchdog to make sure that nobody's identity has been compromised? I've seen enough hacked gmail and facebook accounts to suspect otherwise.
              Am I really that secure if some giant third party decides to stop approving any authentications for my site? Seems like the oAuth kingpins have great power over people who drink their oAuth kool aid.
              * Doesn't this seem more like a way to keep more tabs on folks and violate their privacy? E.g., if some oAuth kingpin gets contacted every time I use my credentials somewhere, that gives them a pretty good idea of what I get up to around the Internet.

              EDIT: I wonder what Bruce Schneier would have to say about this.

                So I find myself wondering if folks have some opinions about best practices for user registration, authentication, etc. I've searched around for authentication best practices and found the [ulr=https://www.owasp.org/index.php/Authentication_Cheat_Sheet]OWasp Authentication Cheat Sheet[/url] which looks pretty interesting, although I supect it starts to look dated pretty quickly. MSDN also offers some guidelines and I find it comical how this document simply instructs visitors to use one or another of various M$ technologies and tools.

                I'm also looking at migrating a user table which contains about 100,000 hashed passwords -- with two different types of hash already in there -- to a new user table and to use a more up-to-date password hashing algorithm. Does anyone have thoughts on how to keep one's password hashing algorithm current? Obviously, once a password is hashed you have to keep that old password until the user authenticates again or you risk losing it. Is it acceptable to look at the hashed password to try and sniff out the original hash algorithm used to hash it? In my case, those containing a colon correspond to REALLY OLD (v1) hashes which are just a salt and md5. Those starting with $P$ correspond to OLD (v2) hashes which now must be updated to my yet-to-be-implemented state-of-the-art system which will probably just make use of [man]password_hash[/man] with suitably modern settings.

                This reminds me of some questions I asked in my original post:

                sneakyimp wrote:

                if you use password_hash, what algorithm do you specify? Does anyone ever use the default hash algo? Does anyone provide their own options to specify a cost and/or salt?

                  I use default, because that's what is recommended, and I never supply a salt, again by best guidelines. My password check is simple:

                  $opts = ['cost' =>11]; // because this is what is ideal for my server
                  if(password_verify($pass, $hash, $opts)) {
                     $user->last_login = DB_NOW;
                     if( password_needs_rehash($hash, PASSWORD_DEFAULT, $opts) ) {
                        $user->pass = password_hash($pass, PASSWORD_DEFAULT, $opts);
                     }
                     $model->updateUser($user);
                  }
                  
                    Derokorian;11044695 wrote:

                    I use default, because that's what is recommended, and I never supply a salt, again by best guidelines. My password check is simple:

                    $opts = ['cost' =>11]; // because this is what is ideal for my server
                    if(password_verify($pass, $hash, $opts)) {
                       $user->last_login = DB_NOW;
                       if( password_needs_rehash($hash, PASSWORD_DEFAULT, $opts) ) {
                          $user->pass = password_hash($pass, PASSWORD_DEFAULT, $opts);
                       }
                       $model->updateUser($user);
                    }
                    

                    Thanks for your example, but aren't you worried about what happens when you have to move your code to some other server with a different, newer version of PHP? From the documentation:

                    PASSWORD_DEFAULT - Use the bcrypt algorithm (default as of PHP 5.5.0). Note that this constant is designed to change over time as new and stronger algorithms are added to PHP. For that reason, the length of the result from using this identifier can change over time.

                    If the default algorithm changes, then the hashes are going to be different, and no one is going to be able to login. Not specifying some explicit algo seems really foolhardy to me as you might end up saying good grief what was PASSWORD_DEFAULT when I wrote this code 5 years ago??

                      Nope, that's not a concern at all. From the documentation:

                      The used algorithm, cost and salt are returned as part of the hash. Therefore, all information that's needed to verify the hash is included in it. This allows the password_verify() function to verify the hash without needing separate storage for the salt or algorithm information.

                      And if you look at my example, I verify the password, and then immediately check if the hash needs to be updated to the current DEFAULT using [man]password_needs_rehash[/man].

                      Also, I made a boo-boo when cleaning it for public viewing:

                      $opts = ['cost' =>11]; // because this is what is ideal for my server 
                      
                      // first verify the password matches the stored hash
                      if(password_verify($pass, $hash)) { 
                         $user->last_login = DB_NOW; 
                      
                      // now check if t he hash needs to be updated to the current PASSWORD_DEFAULT
                         if( password_needs_rehash($hash, PASSWORD_DEFAULT, $opts) ) { 
                      // if so, update it by hashing the passed in matching password.
                            $user->pass = password_hash($pass, PASSWORD_DEFAULT, $opts); 
                         } 
                         $model->updateUser($user); 
                      }

                        Ahhh. OK it looks like the PHP folks have addressed my concerns -- although I still find it nerve-wracking to think that PHP may change and some time in the distant future I might have a whole db full of millions of hashed passwords and I'll have to reverse engineer the hash technique somehow.

                          sneakyimp;11044711 wrote:

                          Ahhh. OK it looks like the PHP folks have addressed my concerns -- although I still find it nerve-wracking to think that PHP may change and some time in the distant future I might have a whole db full of millions of hashed passwords and I'll have to reverse engineer the hash technique somehow.

                          But you don't, that's the point of the password_needs_rehash function! I think you are missing the point of the code I posted. You use this on login to check if the password matches the hash currently stored in the db, by means of password_verify. If this succeeds you then check if the hash is out of date by using password_needs_rehash, at this point you still have the password they logged in with in plain text so you can easily rehash the password. The idea being that at no point do you ever have to "convert" your entire database, but only upon login check if the stored hash should be updated, at which point you can easily update it because you have the password in plain text.

                          In the end, you never have a need to reverse engineer the hashes in the database (unless of course you are a hacker trying to steal information).

                            Derokorian;11044713 wrote:

                            But you don't, that's the point of the password_needs_rehash function! I think you are missing the point of the code I posted. You use this on login to check if the password matches the hash currently stored in the db, by means of password_verify. If this succeeds you then check if the hash is out of date by using password_needs_rehash, at this point you still have the password they logged in with in plain text so you can easily rehash the password. The idea being that at no point do you ever have to "convert" your entire database, but only upon login check if the stored hash should be updated, at which point you can easily update it because you have the password in plain text.

                            In the end, you never have a need to reverse engineer the hashes in the database (unless of course you are a hacker trying to steal information).

                            No no I understand what is going on here and I thank your taking care to make sure I understand, and will likely adopt precisely what you've suggested. It just seems a little worrisome to adopt an ever-shifting password hashing scheme and just rely on PHP to engineer a system that will always keep track of old hashing schemes. These are one-way hashes after all. If at any point in the future we have some hash stored in our db and the PHP functions don't recognize it, the original password is irretrievably gone. And we both know that a large user table is going to have abandoned accounts in it that will have those ancient hashed passwords so one can never really forget about passwords hashed in days of yore.

                            At the moment, I'm wondering if I can be sure that the hashes in my current table (with 100k users) are distinguishable from hashes that will be generate by the [man]password_hash[/man] function or whether I need to create a new column/table to store my legacy hashes created by Joomla (our old framework/cms). In particular, all the hashes in our user table seem to be of two forms:
                            md5 hashes with salt delimited by a colon
                            some other hash indicated by $P$ at the beginning of the hash.

                            Basically, I need to determine for a given user if they have a legacy password hash from the Joomla days or whether they have one generated by password_hash. Any thoughts for a reliable method to discern legacy password hashes from those generated by password_hash?

                              sneakyimp wrote:

                              It just seems a little worrisome to adopt an ever-shifting password hashing scheme and just rely on PHP to engineer a system that will always keep track of old hashing schemes. These are one-way hashes after all. If at any point in the future we have some hash stored in our db and the PHP functions don't recognize it, the original password is irretrievably gone.

                              It seems unlikely that a future version of PHP will drop support for a hash algorithm from password_verify. Even if it did, since the information to figure out the algorithm used is stored as part of the hash, it will always be possible to find out the algorithm and implement it yourself.

                              sneakyimp wrote:

                              In particular, all the hashes in our user table seem to be of two forms:
                              md5 hashes with salt delimited by a colon
                              some other hash indicated by $P$ at the beginning of the hash.

                              Basically, I need to determine for a given user if they have a legacy password hash from the Joomla days or whether they have one generated by password_hash. Any thoughts for a reliable method to discern legacy password hashes from those generated by password_hash?

                              By $P$ you mean literally the prefix string '$P$', or is P a placeholder? It seems to me that you can easily write some kind of regex for this.

                                sneakyimp;11044715 wrote:

                                Basically, I need to determine for a given user if they have a legacy password hash from the Joomla days or whether they have one generated by password_hash. Any thoughts for a reliable method to discern legacy password hashes from those generated by password_hash?

                                Everything generated by BCRYPT (the only hash supported by password_hash currently) will all start with $2y$<COST>$ which means youc an match on the first 4 characters for $2y$ and know that is it from BCRYPT.

                                  laserlight;11044717 wrote:

                                  It seems unlikely that a future version of PHP will drop support for a hash algorithm from password_verify. Even if it did, since the information to figure out the algorithm used is stored as part of the hash, it will always be possible to find out the algorithm and implement it yourself.

                                  I'm certain that the PHP designers are going to be shrewder than I in designing some ever-after concept so I'm inclined to trust them. And I would agree that it seems unlikely (but not impossible) they'll drop some algorithm in the future. I suppose it's the "find out the algorithm and implement it yourself" that scares me. Perhaps it's leftover trauma from my early PHP days or something, but I have seen things deprecated which do break old sites that I have been forced to go back and fix. But I complain too much. I'm going to use password_hash. Might as well stop complaining about it.

                                  laserlight;11044717 wrote:

                                  By $P$ you mean literally the prefix string '$P$', or is P a placeholder? It seems to me that you can easily write some kind of regex for this.

                                  yes the literal string $P$. I can certainly find an expression to match that (and will do so momentarily). I was mostly concerned about some current or future password_hash somehow beginning with this string. As I will be migrating my existing password hashes to a new table, I expect I'll probably take the additional step of prepending LEGACY- to those old hashes just to be super safe.

                                  Anyone have any thoughts/criticisms of the OWasp authentication recommendations?

                                    sneakyimp wrote:

                                    I was mostly concerned about some current or future password_hash somehow beginning with this string. As I will be migrating my existing password hashes to a new table, I expect I'll probably take the additional step of prepending LEGACY- to those old hashes just to be super safe.

                                    Ah, good point. Another possibility though is that if you're storing the last login date/time, then you can compare that to the time of migration, i.e., if it is newer than the time of migration, you pass the job over to password_verify even though it matches the regex, otherwise you log the user in and migrate to use password_hash. I imagine the algorithm would be something like this:

                                    if hashed password is of a legacy format and last login is not newer than time of migration:
                                        if password is verified:
                                            password_hash then store
                                            return user is authenticated
                                        else:
                                            return user is not authenticated
                                    
                                    if password_verify:
                                        if password_needs_rehash:
                                            password_hash then store
                                        user is authenticated
                                    else:
                                        user is not authenticated
                                      sneakyimp wrote:

                                      I'm certain that the PHP designers are going to be shrewder than I in designing some ever-after concept so I'm inclined to trust them

                                      Well.... they still support MD2, which has been obsolete for something like ten years now.

                                      I was mostly concerned about some current or future password_hash somehow beginning with this string.

                                      That is a potential problem; there's no oversight on prefix/algorithm mappings (see, for example, http://pythonhosted.org/passlib/modular_crypt_format.html and https://github.com/ademarre/binary-mcf), and hence no provision for "private-use" prefixes. Maybe the IANA will one day keep a registry.