I'm working on refactoring a site I built when my PHP and database knowledge was caveman-primitive. While it has been running fabulously and profitably for years now, traffic is growing and it's starting to creak a bit under the growing load. I need to refactor a variety of things, but the database access is probably the most important. Yesterday, the DB server (an Amazon RDS instance) had its CPU maxed out and it looked pretty much like the CPU was maxed out pretty much all day long M-F during business hours. I tweaked some MySQL parameters to increase memory available for caching and it's much happier now that it's making use of the copious memory in the db server. Still, the DB is working hard.
I've got some basic questions:
1) Is there some way to alter my database access class to recognize which queries might benefit from an index?
I know from the MySQL status vars (see below) that some queries are running that result in table scans and such and that adding some indexes might help. The problem is that I have hundreds of queries and hundreds of tables and I'm not sure where to start. All queries are accomplished via a centralized DB class, but I don't want to introduce any changes for monitoring that might severely hamper performance. Is there some easy way to report queries that are not properly using indexes? I was imagining using pattern matching of some kind on explain statement but that sounds tricky and bad for performance.
2) What changes might I make to anticipate the user of a cluster instead of a single machine?
Given our rate of growth, I expect we will soon outgrow a single-machine DB server configuration and we'll need to move to a cluster. I've looked into this a bit and vaguely understand that when you have a master-slave configuration, there is typically some (hopefully short) period during which master and slaves are out of sync. I also understand (perhaps incorrectly) that it's easier to set up a single master and multiple replicating slaves rather than a multiple-master cluster where the db nodes somehow manage to keep data in sync. That said, it seems clear that I'll probably want to distinguish my write queries from my read queries so I can route writes to the master and reads to a slave. Also, I might need to alter my code to keep in mind the lag time between master and slaves. Lastly, it seems like it might be wise to delegate busy session-related tables to the slaves and try to make sure that a) users are consistently served by a single web server and b) web servers consistently connect to the same DB slave. Help! I need some input here.
3) Given the following detail from phpMyAdmin, any thoughts on my performance levels and how I might improve them?
Please note that I spent several hours trying to increase table cache sizes and so on response to these values and I've seen the server use more memory, but I'm reluctant to make more drastic changes lest I introduce instability. Any advice on specific settings would be much appreciated.
// salient details from phpmyadmin's "status" tab:
This MySQL server has been running for 0 days, 23 hours, 26 minutes and 39 seconds.
Since its startup, 4,625,274 queries have been sent to the server.
select 641 k 27.342 k 14.68%
update 280 k 11.960 k 6.42%
set option 259 k 11.032 k 5.92%
change db 258 k 11.000 k 5.91%
insert 169 k 7.216 k 3.87%
delete 20 k 840.038 0.45%
// items flagged in red to indicate potential problems / areas for improvment:
Slow_queries 244 The number of queries that have taken more than long_query_time seconds.
Innodb_buffer_pool_reads 78 The number of logical reads that InnoDB could not satisfy from buffer pool and had to do a single-page read.
Handler_read_rnd 7,456 k The number of requests to read a row based on a fixed position. This is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan whole tables or you have joins that don't use keys properly.
Handler_read_rnd_next 79 G The number of requests to read the next row in the data file. This is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
Slow_launch_threads 2 The number of threads that have taken more than slow_launch_time seconds to create.
Created_tmp_disk_tables 35 k The number of temporary tables on disk created automatically by the server while executing statements. If Created_tmp_disk_tables is big, you may want to increase the tmp_table_size value to cause temporary tables to be memory-based instead of disk-based.
Select_full_join 7,460 The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables.
Select_range_check 3 The number of joins without keys that check for key usage after each row. (If this is not 0, you should carefully check the indexes of your tables.)
Sort_merge_passes 11 k The number of merge passes the sort algorithm has had to do. If this value is large, you should consider increasing the value of the sort_buffer_size system variable.
Opened_tables 7,089 The number of tables that have been opened. If opened tables is big, your table cache value is probably too small.
Table_locks_waited 57 k The number of times that a table lock could not be acquired immediately and a wait was needed. If this is high, and you have performance problems, you should first optimize your queries, and then either split your table or tables or use replication.