robkir wrote:What would you use to inspect the incoming data and displaying the outgoing? No tricky data here, just names, email, phone, passwords, user names--stuff like that.
Well part of this comes from the programmer's choice. For example, what characters do you want a username to consist of? It's usually easier to construct a "white-list" of allowed characters rather than trying to block the ones you don't want. There's also things like minimum/maximum length and required characters to think about (e.g. would you want me registering with a username of simply ' ', a space?).
Data validation aside, you most likely want to escape the data for SQL queries. I say most likely because I don't always escape data after I've done all of my sanity checks. If I test that their phone number contains nothing but numbers, dashes, spaces, or periods, then I already know that it's not going to break my SQL query so escaping it is a moot point. Likewise for passwords; I never store the password itself but either a hashed or encrypted version of the password (neither of which, depending upon the algorithms used, tend to produce characters that break SQL statements). Regardless, it never hurts to err on the safer side, so feel free to escape all data with the DBMS-specific escaping mechanism. If you were using MySQLi/PDO with prepared statements, for example, the escaping is done by the database driver automagically.
Note that I didn't mention anything about HTML. When I store user-supplied data into a database, I store just that: the data that the user supplied. Did the user supply "i < u" or "i < u" ? They clearly aren't the same. Whenever I display this data on an HTML page, however, I retrieve the data from the database and use HTML-sanitizing functions such as [man]htmlentities/man to make sure the given data won't disrupt my HTML (and to prevent XSS; what if my username was "<script>window.location='some_bad_place.com'</script>" and you decided to output that without an HTML-sanitizing function?).
robkir wrote:It seems to me every article I read has a slightly different way of doing vetting the data.
Not surprising. There's nothing close to a unified "one-size-fits-all" coding approach that everyone knows and uses. Best I can tell you is to try and learn the basics of security, data integrity, etc. and do the best you can to write your code around those ideas. Don't forget, it takes nothing but a bit of spare time to publish an article on the internet these days - no credentials or proven experience required.
robkir wrote:I would think my goals are to keep out characters that SQL doesn't like and present data to the user without funny characters (or choke HTML).
I pretty much stated my whole take on this above, but just to re-iterate: after all of your own personal data validation (length, (dis)allowed content, etc.), I like to maintain the data integrity.
If a person's last name is O'Reilly, surely you won't turn them away just because the apostrophe might throw a wrench into things, right? Validate the data, then sanitize it for SQL queries - it won't matter what you throw into the mix after that.
robkir wrote:How else would they know what to key in to gain access?
By simply loading up the CAPTCHA image presented, filtering out the background noise, and reading in the character matches. The more distorted the image is, the harder it is for the bot (and the human, unfortunately) to automatically parse the characters out of it.