Ok before I even get into any geosptial calculations or extra libraries, I think I am bumping up against a pretty fundamental wall even with the simplest approach. My performance is about ten times faster now that I removed the zip field so that all i have is 4 fixed-length fields: id, f_id, longitude, and latitude:
CREATE TABLE `test_zip_assoc` (
`id` int(12) unsigned NOT NULL auto_increment,
`f_id` int(12) unsigned NOT NULL default '0',
`lat_radians` decimal(6,5) NOT NULL default '0.00000',
`long_radians` decimal(6,5) NOT NULL default '0.00000',
PRIMARY KEY (`id`),
KEY `lat_radians` (`lat_radians`),
KEY `long_radians` (`long_radians`)
) TYPE=MyISAM;
I also reduced longitude and latitude to only 6 significant digits (roughly 1/25th of a mile).
I set up a script to choose a zip code at random and then randomly select 5 f_id'swithin 50 miles of it and then repeat 50 times. The time required for the main query is about 1000 times faster than what I was originally trying to do:
$sql = "SELECT id FROM test_zip_assoc WHERE
lat_radians > " . ($zip_lat - $max_dist_radians) . " AND
lat_radians < " . ($zip_lat + $max_dist_radians) . " AND
long_radians > " . ($zip_long - $max_dist_radians) . " AND
long_radians < " . ($zip_long + $max_dist_radians);
HOWEVER It starts to become too slow once 10 million records are in the database. This is very encouraging but I still need to do better. I suspect that the slowness might be related to the fact that the number of records returned from that query grows to around 50,000 when there are 10 million records in the db and so the large result set being copied from mysql to php could be my problem. On the other hand, fifty thousand 12-digit numbers doesn't sound like a hell of a lot of memory to me. The problem might be the growing size of the database and indexes.
I've attached some graphs and data u might find interesting.
Can anyone suggest further improvements? I've managed a many-fold increase in speed over my original approach but I want more more more. I'm starting to wonder if mysql configuration stuff (like memory limits and such) might be tweaked to get better response with growing database.