[RESOLVED] Security Question

robkir

I'm still a newbie, getting better with every attempt, but still woefully under educated. I have a security question.

I am setting up a database in which potential will register themselves. What I am planning to do is publish a validation code for potential users to copy. The user then goes to another page and registers using the validation code to get to the registration form. He enters his name and an email address and submits the form. The script then adds him to the database and sends an auto reply with a temporary username and password. The user then can log into the database and change his username and password to anything he/she wishes.

So the question--does this sound strong enough to ward off the 'bots'. And... am I inviting trouble from a bored 13 year old hacker?

Thanks,

Robkir

Nile

Its okay, but I would let the user chose there own user and pass. Also, I would make it so the validation code gets sent in the email, and after that, when registering you have to use the code. The most important think is that you have to make sure you escape the strings before you enter it into the database!

bradgrafelman

The registration code will force users to use valid e-mail addresses if you e-mail it to them, yes.

If its "bots" you're wanting to ward off, another good line of defense is adding a good CAPTCHA to your registration form.

One of my favorite ideas early on in the whole CAPTCHA craze (you can find tons of the image variety out there) was to use randomly-generated simple questions, such as "Type the answer to: 1 times seven."

robkir

Hi Niles,

I like the idea of emailing the validation code. Thanks.

I was going to use mysql_real_escape_string, strip_tags and addslashes. Is there anything else I should consider?

Everything I've written so far has been to a closed community -- its a bit scary to think about dealing with unknown users, but fortunately nothing is data critical and I do back up regularly. So hopefully this will all work.

robkir

Hi bradgrafelman,

I looked at the CAPTCHA and decided against it--but I really like your idea of having a randomly generated question set. I assume one would put the answer in the database so that no hints are visible in the script?

Thanks,

Robkir

Nile

Make sure you either sha1, or md5 encrypt the password. I've got no more suggestions besides also using htmlentities!
If you want captcha without the struggle, use http://recaptcha.net/.

bradgrafelman

robkir wrote:
I was going to use mysql_real_escape_string, strip_tags and addslashes.

Well, I'd say you're on the right track for using a DBMS-specific escaping function (just note that the 'mysql' library is old and outdated; we're moving on to bigger and better things such as [man]mysqli[/man], [man]PDO[/man], etc. 😛), but I'd be cautious about throwing in extra functions on top of each other.

For example, using [man]strip_tags/man may work for your application. Personally, I would rather go with Nile's suggestion and use [man]htmlentities/man when displaying data from the database (not when storing it - destroys data integrity, etc.).

[man]addslashes/man, on the other hand, should never be used (speaking about this scenario only, obviously - not in general). You don't use it for SQL query preparation, and you shouldn't need it to produce proper output.

robkir wrote:
I looked at the CAPTCHA and decided against it--but I really like your idea of having a randomly generated question set. I assume one would put the answer in the database so that no hints are visible in the script?

Well either way, make sure you do something in the form of a CAPTCHA; having one isn't really an option anymore these days.

As for my example of an alternative, it doesn't matter where you store the questions and answers. Storing them directly into a PHP script is just fine; no one on the web should be able to see your PHP code unless you've got serious configuration issues with your webserver (meaning it's not running PHP code but instead passing it along as a normal text or HTML document).

Nile wrote:
Make sure you either sha1, or md5 encrypt the password.

Excellent point, though I might make one correction; those methods of securing passwords you listed are hashing algorithms, not to be confused with encryption. 😉

Also note that md5 doesn't really offer much security anymore. sha1, sha256, or something along those lines would be a preferred method of hashing a password.

robkir

Hi Nile,
I forgot about htmlentities. I'll stick that one in too.

I took a look at recaptcha.net -- looks promising. I liked Bradgafeman's idea as well. I need to read a bit more to make a decision. Never a dull moment when it comes to spammers and hackers.

Thanks for the advice.

robkir

Nile

I'm glad to advise you! Thanks for the correction brad!

bradgrafelman

One more comment I'll add about modifying data before SQL queries:

As I mentioned above, you want to be cautious about piling functions on top of each other (well, nesting them within another I guess is a better way of putting it). If you do decide to use something like htmlentities() before the data is inserted to a DB, make sure that the DBMS-specific escaping function (e.g. [man]mysql_real_escape_string/man) is the last (or outer-most) function applied to the data; that way any modifications done by the other function(s) you use won't inadvertently introduce errors into your SQL query.

Nile

Yes, a general rule in all coding, put the most important thing last(or on the outside).

important(lessImportant(hardlyImportant()))

(You made it sound too specific)

bradgrafelman

Nile wrote:
(You made it sound too specific)

Eh.. I suppose I did that on purpose, really.

I don't think of it as "importance" but just another order-of-operations type of problem. I can't really think of an example or a better way to generalize what I was talking about; I personally just have to sit there and logically analyze the flow of data and figure out what might happen if data was transformed by 'foo' and then 'bar' rather than by 'bar' and then by 'foo.'

I suppose if you know the importance of SQL escaping trumps the importance of HTML escaping in this situation, then yeah, your methodology works too. For some (especially newcomers into the world of security and programming), however, that might not be as obvious. :p

robkir

Okay...good advice. I went on line and read up on all of them. But, you know what? I'm still not sure which ones I should use.

What would you use to inspect the incoming data and displaying the outgoing? No tricky data here, just names, email, phone, passwords, user names--stuff like that.

I've actually spent a fair amount of time trying to understand this aspect of programing. It seems to me every article I read has a slightly different way of doing vetting the data. I would think my goals are to keep out characters that SQL doesn't like and present data to the user without funny characters (or choke HTML). Your thoughts are much appreciated.

Also, bradgrafelman, a followup question on CAPTCHA scheme. I naively thought that the bots were somehow examining the script to see what the answers were. How else would they know what to key in to gain access?

Thanks to both of you for your help.

robkir

bradgrafelman

robkir wrote:
What would you use to inspect the incoming data and displaying the outgoing? No tricky data here, just names, email, phone, passwords, user names--stuff like that.

Well part of this comes from the programmer's choice. For example, what characters do you want a username to consist of? It's usually easier to construct a "white-list" of allowed characters rather than trying to block the ones you don't want. There's also things like minimum/maximum length and required characters to think about (e.g. would you want me registering with a username of simply ' ', a space?).

Data validation aside, you most likely want to escape the data for SQL queries. I say most likely because I don't always escape data after I've done all of my sanity checks. If I test that their phone number contains nothing but numbers, dashes, spaces, or periods, then I already know that it's not going to break my SQL query so escaping it is a moot point. Likewise for passwords; I never store the password itself but either a hashed or encrypted version of the password (neither of which, depending upon the algorithms used, tend to produce characters that break SQL statements). Regardless, it never hurts to err on the safer side, so feel free to escape all data with the DBMS-specific escaping mechanism. If you were using MySQLi/PDO with prepared statements, for example, the escaping is done by the database driver automagically.

Note that I didn't mention anything about HTML. When I store user-supplied data into a database, I store just that: the data that the user supplied. Did the user supply "i < u" or "i < u" ? They clearly aren't the same. Whenever I display this data on an HTML page, however, I retrieve the data from the database and use HTML-sanitizing functions such as [man]htmlentities/man to make sure the given data won't disrupt my HTML (and to prevent XSS; what if my username was "<script>window.location='some_bad_place.com'</script>" and you decided to output that without an HTML-sanitizing function?).

robkir wrote:
It seems to me every article I read has a slightly different way of doing vetting the data.

Not surprising. There's nothing close to a unified "one-size-fits-all" coding approach that everyone knows and uses. Best I can tell you is to try and learn the basics of security, data integrity, etc. and do the best you can to write your code around those ideas. Don't forget, it takes nothing but a bit of spare time to publish an article on the internet these days - no credentials or proven experience required.

robkir wrote:
I would think my goals are to keep out characters that SQL doesn't like and present data to the user without funny characters (or choke HTML).

I pretty much stated my whole take on this above, but just to re-iterate: after all of your own personal data validation (length, (dis)allowed content, etc.), I like to maintain the data integrity.

If a person's last name is O'Reilly, surely you won't turn them away just because the apostrophe might throw a wrench into things, right? Validate the data, then sanitize it for SQL queries - it won't matter what you throw into the mix after that.

robkir wrote:
How else would they know what to key in to gain access?

By simply loading up the CAPTCHA image presented, filtering out the background noise, and reading in the character matches. The more distorted the image is, the harder it is for the bot (and the human, unfortunately) to automatically parse the characters out of it.

robkir

Okay. I've got a clearer picture and hopefully I can handle it from here. Thanks for spending the time to answer all my questions. I really appreciate it.

Cheers,

robkir