right on. eastings...northings. leave it to the military to come up with a practical simplification. i've been reading those links about UTM.
If I understand correctly that would seem to reduce this nasty bit:
$sql = " SELECT e.id
FROM events e, zips z WHERE
e.zip = z.zip
AND (" . MILES_PER_RADIAN .
"*(2*asin(sqrt(POWER(sin((" . $my_latitude_radians . " -z.lat_rad)/2),2) + cos(" . $my_latitude_radians . ")*cos(z.lat_rad)*POWER(sin((" . $my_longitude_radians . " - z.long_rad)/2),2)))) < " . $x . ")";
to the considerably less nasty:
$sql = " SELECT e.id
FROM events e, zips z WHERE
e.zip = z.zip
AND (sqrt(POWER((" . $my_easting . " - z.easting), 2) + POWER((" . $my_northing . " - z.northing), 2)) < $distance_factor)";
and we would not have to compromise too much accuracy.
On the other hand, we still have the issue of possibly running the calculation on every single event record in my database to determine which are within a certain distance, right? I'm really wondering if the ORDER BY RAND() combined with a LIMIT 0, 5 is really going to reduce the amount of time spent on this query. Seems to me that MySQL would need to first calculate the distances for all the records before knowing which records would be included so that it could then proceed to order them. Can anyone point me to implementation details of ORDER BY RAND and LIMIT?
In the 2nd article I linked, the author reported this:
For most purposes on smaller database tables, the following will work fine:
$random_row = mysql_fetch_row(mysql_query("select * from YOUR_TABLE order by rand() limit 1"));
$random_row will be an array containing the data extracted from the random row. However, when the table is large (over about 10,000 rows) this method of selecting a random row becomes increasingly slow with the size of the table and can create a great load on the server. I tested this on a table I was working that contained 2,394,968 rows. It took 717 seconds (12 minutes!) to return a random row.
2.4 million records is considerably less than 540 million. Seems to me I need something to reduce the number of records under consideration before even attempting a random row type thing.
I really like weedpacket's point here:
A precondition for (x1,y1) being within n miles of (x2,y2) is that the difference between x1 and x2 cannot be more than n miles itself. Ditto for y1 and y2. This adds additional tests that can be made to filter the potential results.
If I could first use simple comparisons like that--or better yet check indexed values in my event table to weed out events before the distance calculation gets applied, that might speed things up much more dramatically.
for instance:
$max_distance = 50; // find everything within 50 miles
$max_longitude_difference = $max_distance / MILES_PER_LONGITUDE_RADIAN; // use some safe value for MPLR
$max_latitude_difference = $max_distance / MILES_PER_LATITUDE_RADIAN; // this should be constant
$sql = " SELECT e.id
FROM events e, zips z
WHERE e.zip = z.zip
AND index_field < 5
AND (ABS(" . $my_longitude_radians . "-z.longitude_radians) < " . $max_longitude_difference . ")
AND (ABS(" . $my_latitude_radians . "-z.latitude_radians) < " . $max_latitude_difference . ")
AND (" . MILES_PER_RADIAN .
"*(2*asin(sqrt(POWER(sin((" . $my_latitude_radians . " -z.lat_rad)/2),2) + cos(" . $my_latitude_radians . ")*cos(z.lat_rad)*POWER(sin((" . $my_longitude_radians . " - z.long_rad)/2),2)))) < " . $x . ")
ORDER BY RAND()
LIMIT 0,5";
Would those extra checks before the nasty calculation cause mysql to skip the calculation for records that are obviously out of bounds?
There seem to be several components to the performance problem here. One is reducing the nastiness of the calculations being done. Hopefully this can be done by checking easier calculations as in that last bit of code i just typed. A second is to use ORDER BY RAND() and LIMIT to hopefully skip the calculation on all the records. I'm skeptical that approach will actually limit the amount of calculations that MySQL will perform. A third approach is summary tables of some kind.
I have no experience at all with temporary tables, but it had occurred to me that when a user selects a zip code Z and radius R, I could create a temporary table that calculated the distance to all zip codes within R miles of Z. If they choose no radius, then no distance calculation is necessary.