I've got a website that's been running about ten years. It's accumulated some 2.4M users and started to have unacceptable performance problems with new registrations some time ago. We are redesigning it.
Obviously, I'm hoping to improve performance for the user table. The original table, despite having proper indexes, was causing performance issues. It is my belief that this is because the data file has grown quite large. The original table has nearly 60 columns in it (!) so my plan is to create a really simple basic user table:
id
email_address
* hashed_password
and break out all the other columns into separate tables (e.g., user_profile) or something like that.
So I created the new simpler user table, chose InnoDB for the engine, and concocted a query to import records from the live db. I was pretty surprised to see that the new table (352.7Mib including indexes according to PHPMyAdmin) was not that much smaller than the old table (818.5Mib including indexes) despite the fact that I had eliminated over 90% of the columns in the table AND had decreased the length of the email_address field from 255 chars to just 70 chars.
When I changed the engine on the new table from InnoDB to MyISAM it resulted in a dramatic reduction in the db size (270.2Mib including indexes).
So I started wondering about the differences between MyISAM and InnoDB and found this rackspace page which looks pretty helpful. I'm still wondering though if maybe MyISAM might be better. I don't foresee using a lot of transactional type stuff on the user table? Or perhaps I should....
The basic performance-killing operation is that I want to check if an email address is in use before allowing someone to register with it (emails must be unique). Two approaches come to mind:
try to insert email address and catch error that pops up with UNIQUE index prevents double entry of a particular address
use transactional approach: LOCK TABLE, check for address, if it doesn't exist, INSERT, then UNLOCK.
In practice (and this is very hard to simulate), I have classrooms with dozens of people all trying to register at the same time and the table-locking approach sounds like it might really have a fundamental performance problem. Can anyone comment here? What is the best way to approach this problem?