connection pooling?

sneakyimp · Jan 18, 2007

ok i just suffered thru code review. some PHD critized my code for not using connection pooling. i've googled around and get the feeling that PHP doesn't offer connection pooling.

is this really a huge problem with my php project? i'm not using persistent connections because i seem to recall 'too many connections' errors coming up all the time.

i could really use some help here

dougal85 · Jan 18, 2007

Could he mean you are opening multiple mySQL connections? instead of opening one and using it.

sneakyimp · Jan 18, 2007

i have encapsulated all my database connection code in a database class. exactly one instance of that class is instantiated on any given page and is used for EVERY query.

this 'admittedly cursory' code review also made that mistake regarding my code. i am only opening one db connection per page access.

MarkR · Jan 18, 2007

Connection pooling is something which multithreaded programs can use. PHP usually runs in a multi-process scenario, which is not compatible with connection pooling.

The only thing you have similar to connection pooling, is persistent connections. These however, need to be used carefully as it's easy to screw things up with them. The most common problems are:

Too many connections - if your server is serving more than one application, or your application is using more than one database, that number will be multiplied by the number of processes. Use just ONE connection across the entire app, and don't run any other apps on the same server (or rather, use persistent connections on just one of them).

You should definitely check MaxClients (e.g. if using Apache), and ensure that it's not too high. More than 50 is probably too many. Remember that this is the maximum number of PHP processes you can have, with one persistent connection each.

Connection state hanging over from previous requests

PHP doesn't make any attempt to reset the state of a persistent connection - therefore it's vital that you don't do anything stateful. This pretty much includes transactions.

Or rather, you COULD do something stateful, but be very careful that you reset the connection to a sane state when your page finishes (Even in the case of an error) - register_shutdown_function could help here, perhaps doing a ROLLBACK.

Any parameters you set (e.g. SET NAMES utf8) should be set consistently after every connection is created. Otherwise, you might run into trouble using old persistent connections.

If you're using a local MySQL database, I wouldn't bother using persistent connections, as connecting locally over a Unix socket (Or perhaps NT named pipe) is very fast.

On the other hand, something like an Oracle server located in Zimbabwe over a VPN has a longer connection time.

If you're running into serious connection performance problems, consider using an intermediate proxying server which does its own connection pooling, if that is possible.

Mark

Sxooter · Jan 18, 2007

Your PHD friend doesn't understand that in PHP, connection pooling (not persistant connections) is NOT something handled in your code. If you want connection pooling you use an add on of some kind.

In PostgreSQL, there's pgpool. pgpool opens X connections to the database. These connections are all for the same user, same db, etc... So you have to work within one db, but can have multiple schemas if need be. The actual REAL connection to a database is what takes time and costs money.

Then you can use pg_pconnect (persistent connection) or pg_connect to connect to the pg_pool application. The connections look like this:

apache/php <--> 150 connections <--> pg_pool <--> 25 connections <--> postgresql server

So, pooling is not handled in the app, it's handled in the architecture. If you run pg_pool on the normal postgresql port (5432) and move postgres to another port, the app doesn't even know it's pooling in any way.

There are other ways to do connection pooling.

But what bothers me is the slavish devotion your PhD friend seems to have to the "one true way". The fact is that to have pooling built into an app when one hasn't examined the usage patterns smacks of one size fits all thinking. It may well be that the more rugged and reliable non-pooling methodology is a better fit for what you're doing. Like, if you've got a corporate intranet for 100 employees, and only 3 or 4 will ever connect at a time, then pooling is not required, and increases complexity uneccesarily.

It's like complaining that a golf cart doesn't have cruise control and air conditioning.

sneakyimp · Jan 18, 2007

This dude is NOT my friend

I was instructed to develop this project to support scaling to very large scales so connection pooling might eventually become very important. I was smart enough to encapsulate every database access inside a class. As far as I know, I haven't implemented queries at all which should result in 'statefulness'. They are all simple INSERT/UPDATE/DELETE queries. Only one connection per PHP page should be made in any situation (unless there's an iframe holding another php page which happens in some cases).

Thanks for the clarifications. Can I feel better now about this? I know my app is slow at the moment and have replied to this affront by complaining that the 'code review' doesn't address the real reason for the slowness which is a thorny computational problem at the root of things (more here)

Sxooter · Jan 18, 2007

sneakyimp wrote:
This dude is NOT my friend

I was instructed to develop this project to support scaling to very large scales so connection pooling might eventually become very important. I was smart enough to encapsulate every database access inside a class. As far as I know, I haven't implemented queries at all which should result in 'statefulness'. They are all simple INSERT/UPDATE/DELETE queries. Only one connection per PHP page should be made in any situation (unless there's an iframe holding another php page which happens in some cases).

Thanks for the clarifications. Can I feel better now about this? I know my app is slow at the moment and have replied to this affront by complaining that the 'code review' doesn't address the real reason for the slowness which is a thorny computational problem at the root of things (more here)

OK, I've looked at that thread. You're making two mistakes in general I can see.

1: You're using distinct. If you have to use distinct, you're spending a lot of CPU on sorting / uniqueing that you shouldn't have to. without the distinct, the db can just grab the data and go. As soon as you add distinct, it has to order the data then uniq it. That can be very expensive, and is an N² type thing. I.e. as you double the size of your data set you exponentially increase the time to order / distinct the set.

2: You're ordering by rand, and this too is an N² issue.

Solution: Change your data set if you have to, so that you don't need distinct to get distinct values. Then, instead of ordering by rand, get an approximate count of the entries in the table (count(*) can be expensive) and use the php random function to get a set of numbers to use with offset limit. i.e. if there are 34567 rows in the db, then use random(0,34567-x) to get an offset where x is the size of the result set you want +1. So for 10 results, that would be 34567-11.

Then, without the distinct in there, you'll have something like:

select * from mytable where (....) limit 10 offset $offset

you wanna get rid of order by rand especially.

worry about connection pooling when connecting takes up a noticeable chunk of your execution time.

Sxooter · Jan 18, 2007

Just a note. if e.id is not unique, but you need only one entry for what you're doing, then you might do better making a table where e.id is foreign keyed to it and the id entry in that table is a primary key. It doesn't have to have anything else. By having a master table with a pk'd id, you'll get much faster response since you won't have to order by or distinct that field any more.

sneakyimp · Jan 18, 2007

I really appreciate the input Sxooter! If you read thru to the bitter end of that thread, you'll see that we use a much faster approach for obtaining a random event but that the overall query is still puzzingly slow. but can we keep posts regarding that issue in the other thread?

As for connection pooling, I agree. Right now the connection time isn't even noticeable--probably because it's connecting to localhost. Am I right in thinking I could implement connection pooling eventually by either rewriting my database class or perhaps just reconfiguring my server to use a pooling application?

Sxooter · Jan 18, 2007

sneakyimp wrote:
As for connection pooling, I agree. Right now the connection time isn't even noticeable--probably because it's connecting to localhost. Am I right in thinking I could implement connection pooling eventually by either rewriting my database class or perhaps just reconfiguring my server to use a pooling application?

Maybe. Connection pooling isn't something that php is particularly good at. Unless there's been improvements in 6.0 or something I'm not familiar with. add on pooling might help. The real reason it's not an issue yet is because you don't have 200 users hitting your system at the same time. when the number of concurrent users starts to rise is when connection pooling starts to matter.

MarkR · Jan 18, 2007

PHP isn't compatible with connection pooling due it its architecture.

Instead, you end up with a pool of processes / threads with one connection each. This behaves similarly to connection pooling, although you need to be sure that there are enough connections for all of them (Think MaxClients).

So whereas a J2EE server typically does connection pooling, it can only do this because it's in a threaded environment. PHP is not and would require a total change of architecture to do this.

Mark

sneakyimp · Jan 22, 2007

Sxooter wrote:
Maybe. Connection pooling isn't something that php is particularly good at. Unless there's been improvements in 6.0 or something I'm not familiar with. add on pooling might help. The real reason it's not an issue yet is because you don't have 200 users hitting your system at the same time. when the number of concurrent users starts to rise is when connection pooling starts to matter.

I agree that connection pooling isn't even the problem yet. The site is slow with one person (me) accessing it. Am I right in thinking that it doesn't even matter if php can pool connections? I route EVERY SINGLE DATABASE CONNECTION AND QUERY through a database class. And my PHP code
a) doesn't attempt to connect more than once per page access
b) doesn't attempt to spawn any additional threads of execution

Am I right in thinking I could probably use ANY database technology as long as my basic methods were supported? I could just rewrite my database class to connect to Oracle or MySQL server or whatever the hell supports connection pooling?

And lastly, can anyone recommend some good reading for PHP and multithreading and why it's not supported?

Roger_Ramjet · Jan 22, 2007

Why PHP does not multi-thread.

Sxooter · Jan 22, 2007

First off, I wanna point out that neither connection pooling nor threading are the magic pixie dust some folks make them out to be. If you're running a lot of light weight queries at the same time, then they can be a big advantage over the multi-process / non-pooling PHP methodology. But, as the weight of individual pages / db queries increase, the lack of pooling and multithreading become less and less of an issue.

So, it's about what you plan on doing. I write query reporter apps at work quite a bit. The app connects, and 320 seconds later a report comes out. I don't think the connection time is particularly critical there.

Roger_Ramjet · Jan 23, 2007

Connection pooling is only going to make a difference when there is a high overhead for opening a connection. If both servers are on the same box then you have no network latency to slow you down. It is only when there is high network latency that connection pooling returns any benefits because it eliminates the protocol handshaking at connection open.

Sxooter · Jan 23, 2007

It's also useful when you've got hundreds or thousands of front end web connections that need to share a database. It allows you to map those thousands of front ends to dozens of backend database connections. There ARE some technical solutions for this for php and certain databases.

connection pooling?

Ssneakyimp

Ddougal85

Ssneakyimp

MMarkR

SSxooter

Ssneakyimp

SSxooter

SSxooter

Ssneakyimp

SSxooter

MMarkR

Ssneakyimp

RRoger_Ramjet

SSxooter

RRoger_Ramjet

SSxooter