Ideas for optimizing data access on a legacy site?

sneakyimp

I'm working on refactoring a site I built when my PHP and database knowledge was caveman-primitive. While it has been running fabulously and profitably for years now, traffic is growing and it's starting to creak a bit under the growing load. I need to refactor a variety of things, but the database access is probably the most important. Yesterday, the DB server (an Amazon RDS instance) had its CPU maxed out and it looked pretty much like the CPU was maxed out pretty much all day long M-F during business hours. I tweaked some MySQL parameters to increase memory available for caching and it's much happier now that it's making use of the copious memory in the db server. Still, the DB is working hard.

I've got some basic questions:
1) Is there some way to alter my database access class to recognize which queries might benefit from an index?
I know from the MySQL status vars (see below) that some queries are running that result in table scans and such and that adding some indexes might help. The problem is that I have hundreds of queries and hundreds of tables and I'm not sure where to start. All queries are accomplished via a centralized DB class, but I don't want to introduce any changes for monitoring that might severely hamper performance. Is there some easy way to report queries that are not properly using indexes? I was imagining using pattern matching of some kind on explain statement but that sounds tricky and bad for performance.

2) What changes might I make to anticipate the user of a cluster instead of a single machine?
Given our rate of growth, I expect we will soon outgrow a single-machine DB server configuration and we'll need to move to a cluster. I've looked into this a bit and vaguely understand that when you have a master-slave configuration, there is typically some (hopefully short) period during which master and slaves are out of sync. I also understand (perhaps incorrectly) that it's easier to set up a single master and multiple replicating slaves rather than a multiple-master cluster where the db nodes somehow manage to keep data in sync. That said, it seems clear that I'll probably want to distinguish my write queries from my read queries so I can route writes to the master and reads to a slave. Also, I might need to alter my code to keep in mind the lag time between master and slaves. Lastly, it seems like it might be wise to delegate busy session-related tables to the slaves and try to make sure that a) users are consistently served by a single web server and b) web servers consistently connect to the same DB slave. Help! I need some input here.

3) Given the following detail from phpMyAdmin, any thoughts on my performance levels and how I might improve them?
Please note that I spent several hours trying to increase table cache sizes and so on response to these values and I've seen the server use more memory, but I'm reluctant to make more drastic changes lest I introduce instability. Any advice on specific settings would be much appreciated.

// salient details from phpmyadmin's "status" tab:
This MySQL server has been running for 0 days, 23 hours, 26 minutes and 39 seconds.
Since its startup, 4,625,274 queries have been sent to the server.

select 	641 k 	27.342 k 	14.68%
update 	280 k 	11.960 k 	6.42%
set option 	259 k 	11.032 k 	5.92%
change db 	258 k 	11.000 k 	5.91%
insert 	169 k 	7.216 k 	3.87%
delete 	20 k 	840.038 	0.45%

// items flagged in red to indicate potential problems / areas for improvment:
Slow_queries 	244 	The number of queries that have taken more than long_query_time seconds.
Innodb_buffer_pool_reads 	78 	The number of logical reads that InnoDB could not satisfy from buffer pool and had to do a single-page read. 
Handler_read_rnd 	7,456 k 	The number of requests to read a row based on a fixed position. This is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan whole tables or you have joins that don't use keys properly. 
Handler_read_rnd_next 	79 G 	The number of requests to read the next row in the data file. This is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have. 
Slow_launch_threads 	2 	The number of threads that have taken more than slow_launch_time seconds to create.
Created_tmp_disk_tables 	35 k 	The number of temporary tables on disk created automatically by the server while executing statements. If Created_tmp_disk_tables is big, you may want to increase the tmp_table_size value to cause temporary tables to be memory-based instead of disk-based.
Select_full_join 	7,460 	The number of joins that do not use indexes. If this value is not 0, you should carefully check the indexes of your tables. 
Select_range_check 	3 	The number of joins without keys that check for key usage after each row. (If this is not 0, you should carefully check the indexes of your tables.) 
Sort_merge_passes 	11 k 	The number of merge passes the sort algorithm has had to do. If this value is large, you should consider increasing the value of the sort_buffer_size system variable. 
Opened_tables 	7,089 	The number of tables that have been opened. If opened tables is big, your table cache value is probably too small. 
Table_locks_waited 	57 k 	The number of times that a table lock could not be acquired immediately and a wait was needed. If this is high, and you have performance problems, you should first optimize your queries, and then either split your table or tables or use replication.

Weedpacket

sneakyimp wrote:
Is there some easy way to report queries that are not properly using indexes?

If you're not logging slow queries, start doing so. Then trawl through that and look at the queries that are (a) especially slow, and/or (b) frequent. Then sit down with a MySQL client application yourself and use EXPLAIN to see what indexes are available to each table involved in the query, and which one is being used. The primary key is probably already indexed; fields that are used in the WHERE (except for conditions of the form "foo <> bar" - those require full table scans) and ORDER BY clauses are candidates for indexing.

Using EXPLAIN isn't going to kill anyone: all it does is get the DBMS to build an execution plan for the given query - nothing it hasn't done four million times today already.

sneakyimp

OK back to this. My server has been under some kind of assault or something for 24 hours and the CPU has shot to 70-100% usage in 3 very distinct times. It's hard to get diagnostics on the machine. Matters are complicated here by the fact that this server is an Amazon RDS instance so I can't login to it directly via CLI (at least I haven't found a way yet). Just getting the slow query logs is a chore as it must be done via a terrible browser-based console. After downloading and inspecting some 23 slow query logs, I now know they all contain pretty much this and nothing else:

/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument
/rdsdbbin/mysql/bin/mysqld, Version: 5.1.73-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql.sock
Time                 Id Command    Argument

the mysql-error-running log from right before a recent reboot had this:

151105 14:04:16 [ERROR] /rdsdbbin/mysql/bin/mysqld: Out of memory (Needed 13107112 bytes)
151105 14:09:07 [ERROR] /rdsdbbin/mysql/bin/mysqld: Out of memory (Needed 13107136 bytes)
151105 14:11:16 [ERROR] /rdsdbbin/mysql/bin/mysqld: Out of memory (Needed 12737952 bytes)

I tried installing mysqltuner on one of my webservers but this utility doesn't accept any parameters which would let me specify a remote mysql server.

Any suggestions to help me figure out how to fix this would be appreciated.

Derokorian

Using MySQL Workbench, you can monitor client connections (think mytop, but much easier to use), server and status variables, and some other nifty things that I've only done before using CLI.

sneakyimp

Derokorian;11052091 wrote:
Using MySQL Workbench, you can monitor client connections (think mytop, but much easier to use), server and status variables, and some other nifty things that I've only done before using CLI.

I can get the server and status vars via phpMyAdmin (or write some kind of looping PHP script to run via CLI. I posted some detail above. I've already examined this data quite a bit and tuned the mysql settings on the server as best I can. Unfortunately, this doesn't identify the queries which may be causing the trouble.

I'm also wondering if maybe there might be some kind of bot causing trouble by slamming my site.

Derokorian

Um... but in workbench you can WATCH client connections and see what queries are taking time to run, you can then copy those queries from the workbench and run an explain on them.

Also, when I open the slow log it shows me the exact query that was run, and it looks like this (some things censored for forum):

# Time: 151105 13:42:09
# User@Host: *******[*****] @ localhost []
# Query_time: 4.001971  Lock_time: 0.000193 Rows_sent: 272  Rows_examined: 16645819
use database;
SET timestamp=1446752529;
SELECT columns
                      FROM table
            WHERE  conditions GROUP BY o.uid, p.received, p.method, p.order_id, p.data  ORDER BY p.received ASC;

In the log it even maintains the formatting we used when we sent the query, which is also nice. However, is slow log isn't turned on you will just get stuff about the service starting:

/usr/sbin/mysqld, Version: 5.1.63-0ubuntu0.11.04.1-log ((Ubuntu)). started with:
Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
Time                 Id Command    Argument

sneakyimp

Derokorian;11052127 wrote:
Um... but in workbench you can WATCH client connections and see what queries are taking time to run, you can then copy those queries from the workbench and run an explain on them.

Ok the ability to observe what queries are taking time to run sounds very helpful. Sadly, the crisis has passed some hours ago and the server is totally nominal now.

Derokorian;11052127 wrote:
However, is slow log isn't turned on you will just get stuff about the service starting:
/usr/sbin/mysqld, Version: 5.1.63-0ubuntu0.11.04.1-log ((Ubuntu)). started with:
Tcp port: 3306  Unix socket: /var/run/mysqld/mysqld.sock
Time                 Id Command    Argument

As I mentioned, the server is an Amazon RDS server and I cannot log into it directly to check the my.cnf or the slow query log directly. I have verified that my DB Parameter Group (an AWS concept...means server configuration profile) has slow_query_log on. I find it really hard to believe this log is empty. I've started a thread about that on the AWS forums but I'm not keeping my fingers crossed. It's really hard to get answers there.

sneakyimp

<SadTrombone>
How likely is it that a cracked md5 password hash will match the original password? More precisely, if $original_password is hashed to make $md5_hash and some cracker either brute forces (or checks an md5 crack database) $md5_hash to obtain some $cracked_password where md5($cracked_password) == md5($original_password) then what are the odds that $cracked_password == $original_password?</SadTrombone>

Weedpacket

Pretty good, I'd reckon. AFAIK the only MD5 collisions discovered so far have been specially engineered to be such - there haven't been any collisions found in the wild. If $cracked_password != $original_password then $cracked_password would probably be a very awkward thing to use as a password (e.g., it could use any bytes, and there's no upper limit on its length). The reckoning is even stronger because attempts to crack a password would be looking for strings that are comparatively short, decreasing the likelihood of there being collisions in the search space.