sneakyimp;11021721 wrote:
(I'm so glad for this "opportunity" :p)
Yay for you! \o/
sneakyimp;11021721 wrote:
The code I'm dealing with has many instances where developers neglected to use these distinct methods and instead inserted or updated records by manually constructing SQL:
Well, isn't that exactly the kind of code opportunity you're glad for? In other words: rewrite it! And should it for some reason make sense to have direct SQL code, you'd rewrite it by routing it through someInstance->queryRead() or someInstance->queryWrite() depending on read/write query type, so that the distinction becomes clear. But do use cooler names than queryRead and queryWrite.
sneakyimp;11021721 wrote:
So I'm wondering a few things:
Q1: Is there a MySQL replication/clustering scheme where the client doesn't need to route reads to a slave and writes to the master?
Not as far as I know, but I use stuff that others administrate... But personally I'd rather ask this question at a mysql forum.
sneakyimp;11021721 wrote:
I.e., is there some other type of replication/clustering (hopefully free) where I can just leave all those manual SQL inserts/updates/deletes alone and just let Joomla speak to whichever database it is using without worrying about reads vs. writes?
Perhaps there is - Have you checked PostgreSQL? They do have middleware solutions (as in all queries pass the middleware - writes are sent to all dbs, reads to one). Short explanation of options here (PostgreSQL Docs).
sneakyimp;11021721 wrote:
Q2: Is it feasible to reliably sniff a query to determine if it should go to the master?
Yes. If a statement starts with DESCRIBE, SHOW, SELECT (others?) - then it's a read. If not, it's a write. However, do note that this should be sniffed ONCE and NOW, not later and always. You did start clustering stuff for performance issues after all... In other words, use shell, your text editor or a php script to search for all occurances of those words, sift out the queries and then see:
sneakyimp;11021721 wrote:
(I'm so glad for this "opportunity" :p)
sneakyimp;11021721 wrote:
INSERT...SELECT queries or even more bizarre selects with inserts and joins or something that might require a more nuanced approach
Doesn't matter. INSERT ... SELECT is an insert statement. Yes, some of the inserted data is provided by means of a select, which in turn could come from wherever, but
1. It might just be one select query collecting data (from whichever server) and then issuing the insert (to the master)
2. An actual INSERT ... SELECT ... which could be split up to the above
3. An actual INSERT ... SELECT which, most likely for good reasons, is left as is and sent to the master. It is after all a write statement. And moving data from slave to master seems slower than handling it within master. But wether this is noticed might come down to network quality for all I know. And perhaps there is some obscure case where it would be faster selecting from slave to insert on master, but in that case I'm certain you'll pull your hair because it's allready too slow and then realizing that's how you have to do it, rather than starting in that end of the weird tree.
sneakyimp;11021721 wrote:
Any thoughts about an effective pattern-matching scheme would be much appreciated.
#describe|select|show#i
I'm guessing you had something more elaborate and fancy in mind, but in my opinion you have to search for these keywords only (and possibly others - go check the reserved words page of the docs to dig up all read-related keywords). Should you mix them with anything else, you may miss out on stuff for the simple reason that you might have code such as
$q = "SELECT";
/* bla bla bla */
$q .= "stuff FROM tbl";
sneakyimp;11021721 wrote:
Q3: Must there always be a one-to-one relationship between slave db servers and application servers?
No. Your code will create the db conncetions (new PDO, new mysqli...) which means your code determine where each connection goes. You could have master + 2 slaves with 3 web servers, where one issues reads to the master. Or master + slave with 3 web servers that send all writes to master but all reads to slave. Or 2 web servers which send everything to master, but if/when master fails - former slave is now master (= failover only).
I can imagine a scenario where one application server is enough but the database is getting worked to death so it might be helpful to have one application server connect to one of a few slaves. Is that right?[/QUOTE]
Everything is possible. It will depend entirely on the entire eco system: use cases, application code, server configuration, caching.
For example
1. Turning off query cache MAY improve performance (too high number of queries for decent cache hit ratio) => save time by never writing to query cache.
2. If application server runs smooth, but db doesn't: are you utilizing memcached to handle rarely changed data (e.g. rows memcached as serialized objects). Are you utilizing memcached to handle possibly more frequently updated data, but which also exist on every single page? One memcached server for several web servers will most likely work fine (direct access to RAM).
3. If application server runs smooth, are you using some kind of page caching techniques that allows you to store both entire pages and parts of pages as html for direct passthrough to client without needing any php parsing and thus will access no databases? That is, pages or parts thereof which can be considered static (within certain limits). Smarty's abilities in this regard is one of the reasons I like it, and it handles cache in two stages. The first is called pre-compiled templates (the template is turned into php code and looks butt ugly) - this usually only needs to be done if you change the template file (as in modify presentation code). The second stage is caching the output of a template, which is done as (almost) plain html. There are some constructs which allows for dynamic updates of the cache.
4. Nothing beats a db to death like inefficient queries as requests and data increase. In the same way a terribly slow O(log n) search will always beat a very fast O(n2) search for n > than some threshold. Make sure everything is indexed (but only if needed). Make sure you do processing once, before write, if possible, rather than always on read.
Do note that if you use any kind of file caching on load balanced web servers (such as that built into smarty) you may have to set a script to listen for UDP multicasts for files to clear so that one server can issue cache invalidation on all servers.