I'm essentially an atheist, but this seemed appropriate:
1 Corinthians 13:11 wrote:When I was a child, I spake as a child, I understood as a child, I thought as a child: but when I became a man, I put away childish things.
I've spent some time looking at Amazon EC2 and I've got some experience with Rackspace and I want to understand now how I might use the cloud to build a PHP application that can serve millions of users per day. I've noticed a couple of things about cloud services:
- They let you instantiate virtual servers either using PHP code or manually through a control panel.
- EC2 offers a Load Balancer which 'automatically' distributes requests across some collection of virtual servers. Apparently it can also create new computing instances using Auto Scaling. Rackspace has a short tutorial on load balancing but this does not appear to take into account any session-related info so I'm not sure how well it will work if you have a distributed database situation.
- I haven't seen any kind of auto-scaling mass storage system for Rackspace, but Amazon has SimpleDB which is non-relational.
- Zend Framework tries to abstract interfacing with the cloud for File Storage, DB Storage, and Job Queuing.
I want to know the ins and outs of this before attempting to build anything. If anyone has additional input here, I'd sure love to get some more information. My questions:
Non-relational DB -- what does it mean?
SimpleDB is non-relational. As I understand it, this means no JOINs, no GROUP BY, no ORDER BY, no indexes. You can get more than one record at once with some simplified WHERE-type stuff. What other sorts of limitations can I expect? Is this in any way advantageous? I guess it's kind of nice that I can just create some kind of arbitrary record with multiple fields and associate it with some ID.
Is this SimpleDB beast going to respond as quickly as MySQL does?
Seems to me that a globally available distributed database which is being accessed by potentially hundreds of thousands of computers might have some latency issues. Additionally, the access methods I've seen would appear to be routed through DNS requests, through load balancers, etc. I'm guessing that I can't expect the tens of milliseconds access times that I get from a MySQL database running on localhost or on the same LAN as my server.
What about data coherency?
I'm also guessing that this black box SimpleDB might have some issues in keeping data consistent between users in China and users in Los Angeles. For instance, if I enter a new status in my social networking application, how soon before my pen pal in Shanghai can see it? Are there problems that people have experienced due to this data incoherency?
What about sessions?
Is it reasonable to handle sessions using SimpleDB or do I need to account for them in both the local system and also make my load balancer session-aware? I know pretty little about session-aware load balancing so any input here would be much appreciated -- especially if it describes an application architecture to handle sessions in a cloud sytem.
How does one deploy an application to an automatically created computing instance?
If I'm using some kind of load balancer to create and destroy virtual servers in response to fluctuating demand, I will need to deploy my PHP application (and possibly an httpd.conf with rewrite rules in it, database, etc.) to each new instance that gets created. I know that you can create instance 'images' which have some flavor of linux and all of your other server config stuff, but what if you have machine-specific values? How does one handle setup of applications or data that are specific to a machine instance?
I'm really hoping this thread might serve as a resource for folks who would like to learn about implementing scalable PHP applications using the Cloud. Any responses or anecdotes would be greatly appreciated.