Originally posted by drawmack
Check this out: http://www.enderswebdev.com/test/zipdistance.php

Drawmack, I was poking around and noticed this odd behavior from your code:

zip 44137 and 44138. I'll admit, these are two towns I've lived in and I have a good idea how far apart these two are.

Using your results, it shows these two zips being 11.715 miles apart. If you do 44137 and a radius of 20 miles, Olmsted Falls (44138) doesn't show up. But if you do 25 or 30 miles, it then appears.

    Of course you could save a bit of the work by doing all the degree-to-radians conversion and some of the trigonometry during the initial insertion: cos(38°N) isn't going to be changing any time soon, so you might as well just store 0.788011 in the database and use that instead of recalculating it every time.

    Maybe if you're prepared to tolerate some amount of error (e.g., "give or take 5km") (seeing has how geographic regions aren't geometric points, this seems like it could be reasonable), it might be possible to project the relevant section of the Earth's surface onto a flat plane in such a way that the old Euclidean metric sqrt((lat1-lat2)2+(lon1-lon2)2) could be used to get a "good enough" measure of the distance. Then you'd be storing the projected coordinates.

    (If you'd been using PostgreSQL, I'd've cited this but that would probably be too much for your purposes, as all you'd be using would be its distance operator.

      Originally posted by AstroTeg
      Drawmack, I was poking around and noticed this odd behavior from your code:

      zip 44137 and 44138. I'll admit, these are two towns I've lived in and I have a good idea how far apart these two are.

      Using your results, it shows these two zips being 11.715 miles apart. If you do 44137 and a radius of 20 miles, Olmsted Falls (44138) doesn't show up. But if you do 25 or 30 miles, it then appears.

      He snagged the code off the Devshed website link and reworked it calling it "his". Whether that code was good or not, who knows. Apparently it isn't according to your analysis AstroTeg?

      You know, when you get headstrong, take someone else's crappy code, then scream that you got the Holy Grail only to find out that you don't, you tend to get arrows thrown at your back 😃

        I'm not interested in bashing one's code - regardless of how it was created. My bigger concern is if one is using funky/buggy code to perform production operations, they may not be aware there is a problem.

        Weekpacket: thanks for the link. I'd like to give that code a whirl in the future.

        Although yesterday I had fun and converted a C++ latitude/longitude distance finding function for MySQL to a C flavored one for PostgreSQL. I got it to work and can be found in action here. Its doing the ugly "calculate every zip code distance" trick before handing out the results (the difference is that its all done in C code). It seems to go pretty quick...

          drawmack,

          I read your query and it seems pretty straightfoward.
          But if I have a table with store names and addresses(and zip code), how do I cross reference those store zip codes with the stores I have in my store table?

          Thx

            Originally posted by Manat
            drawmack,

            I read your query and it seems pretty straightfoward.

            I'd be cautious using that query. It looks like it me have a bug. I haven't had time to figure out the source, but its not bringing back all the results it is supposed to.

              Originally posted by AstroTeg
              Drawmack, I was poking around and noticed this odd behavior from your code:

              zip 44137 and 44138. I'll admit, these are two towns I've lived in and I have a good idea how far apart these two are.

              Using your results, it shows these two zips being 11.715 miles apart. If you do 44137 and a radius of 20 miles, Olmsted Falls (44138) doesn't show up. But if you do 25 or 30 miles, it then appears.

              I snagged by bd from the census beareu which is not a garenteed acruate source. But for garenteed acuracy you have to pay $20.00 for the data and I didn't feel like paying for something that I was only using as a sample.

                Originally posted by kkobashi
                He snagged the code off the Devshed website link and reworked it calling it "his". Whether that code was good or not, who knows. Apparently it isn't according to your analysis AstroTeg?

                I posted links to the devshed page in my original post and stated that this is where I got the query the php code is mine as the page I got it from was a cold fusion page.

                You know, when you get headstrong, take someone else's crappy code, then scream that you got the Holy Grail only to find out that you don't, you tend to get arrows thrown at your back 😃

                Of course you know that my system runns off of a db and this is a problem with the data not the code. Of course when you make assumptions and then go posting inflamatory statements based on those assumptions you've got to expect this soemtimes.

                  Originally posted by Manat
                  drawmack,

                  I read your query and it seems pretty straightfoward.
                  But if I have a table with store names and addresses(and zip code), how do I cross reference those store zip codes with the stores I have in my store table?

                  Thx

                  Just feed the store's zip and the distance radius you want in miles to getZipsWithin function.

                  Take the zips from the array returned and feed those back into your locations table by selecting the stores where zip in (...) and you'll get a list of stores with the matching zips.

                    Originally posted by drawmack
                    I snagged by bd from the census beareu which is not a garenteed acruate source. But for garenteed acuracy you have to pay $20.00 for the data and I didn't feel like paying for something that I was only using as a sample.

                    Right. But you missed my point.

                    Your distance algorithm correctly calculates the distance.

                    It does not correctly calculate the radius.

                    Test it.

                    Enter 44137 as the zip and search in 10 mile, 15 mile, 20, and 25 mile radius.

                    Look at the results and find when 44138 appears. Also note the calculated distance. Its about 11 miles. Notice when it appears. It should show up in any query over 10 miles. But you have to query with 24 miles for this town to show up.

                    Sounds like your code has a bug in its radius detection (or its doing too much trimming on the lat/long and removing some records prematurely from the results). Make sense?

                      Originally posted by drawmack
                      Of course you know that my system runns off of a db and this is a problem with the data not the code. Of course when you make assumptions and then go posting inflamatory statements based on those assumptions you've got to expect this soemtimes.

                      Actually, re-read my post. Its a problem with your radius calculation(s), not the data.

                      If there was a data problem, then the mileage would just be wrong. But your mileage appears correct as calculated for your data set. The problem you're dropping valid records which are supposed to be part of the results.

                        Originally posted by AstroTeg
                        Enter 44137 as the zip and search in 10 mile, 15 mile, 20, and 25 mile radius.

                        Look at the results and find when 44138 appears. Also note the calculated distance. Its about 11 miles. Notice when it appears. It should show up in any query over 10 miles. But you have to query with 24 miles for this town to show up.

                        Sounds like your code has a bug in its radius detection (or its doing too much trimming on the lat/long and removing some records prematurely from the results). Make sense?

                        okay I got ya now, I'll look into that. I thought you were pointing out a bug I already found that was not a bug in the code but rather faulty data which give out the wrong distance between some towns.

                          Originally posted by drawmack
                          okay I got ya now, I'll look into that. I thought you were pointing out a bug I already found that was not a bug in the code but rather faulty data which give out the wrong distance between some towns.

                          I'd look into the where clause; what is the spatial relationship between these two codes (i.e., what is the bearing from one to the other)? That WHERE clause will only find regions within a square - and squares are not circles. So you'll either risk missing points that are inside the circle but outside the square, or inside the square but outside the circle (depending on whether your square is an under- or overestimate). Then again, it is assuming that the Earth is perfectly spherical, with a radius of 4000 (U.S. statute) miles, and that you're on the equator; so it's an overestimate.

                          Can you not use the distance calculation you're returning directly in the WHERE clause itself - without having to duplicate the code?

                            Originally posted by Weedpacket
                            Can you not use the distance calculation you're returning directly in the WHERE clause itself - without having to duplicate the code?

                            Yeah I can, I'll try that and see what it does to performance.

                              Well I wound up having to repeat the code for the calculation as when the where clause was executed it didn't know what the distance field was, however from the speed of execution it appears that mysql is only donig the calculation once. Just Replace the query in the original posted code with this code

                              <?php
                                  $sql  = "SELECT zip_code,\n";
                                  $sql .= "ROUND((ACOS((SIN(" . $zipInfo->latitude . "/57.2958) * ";
                                  $sql .= "SIN(latitude/57.2958)) + (COS(" . $zipInfo->latitude . "/57.2958) * ";
                                  $sql .= "COS(latitude/57.2958) * ";
                                  $sql .= "COS(longitude/57.2958 - " . $zipInfo->longitude . "/57.2958)))) ";
                                  $sql .= "* 3963, 3) AS distance\n";
                                  $sql .= "FROM zip_codes\n"; 
                                  $sql .= "WHERE ROUND((ACOS((SIN(" . $zipInfo->latitude . "/57.2958) * ";
                                  $sql .= "SIN(latitude/57.2958)) + (COS(" . $zipInfo->latitude . "/57.2958) * ";
                                  $sql .= "COS(latitude/57.2958) * ";
                                  $sql .= "COS(longitude/57.2958 - " . $zipInfo->longitude . "/57.2958)))) ";
                                  $sql .= "* 3963, 3) <= " . $miles . "\n";
                                  $sql .= "ORDER BY distance\n";
                              ?>

                              I tested the known error listed above in the thread and 44138 showed up at 12 miles which it should.

                                Originally posted by drawmack

                                I tested the known error listed above in the thread and 44138 showed up at 12 miles which it should.

                                Groovy!

                                If I'm doing my math right and I've read your code correctly, you're issuing 3 queries:

                                Query 1: Look up source zip's lat/long coordinates

                                Query 2: (the one that got fixed) Retrieve the list of matching zips

                                Query 3: Retrieve zip's meta data (city, state, distance, etc). Here would be where you search for matching stores, etc.

                                What I'm curious to play with is combining query 2 and 3 together. The page on rsbauer.com joins 2 and 3 together. But now I'm wondering how it would do if I selected the zips first and then queried the extra data (good time to play with PostreSQL's subselects).

                                  Here it the complete code for you.

                                    This was interesting... I was playing with my query. I was using the 2 query approach. First query gets the target zip's lat/long coordinates. Query 2 does the radius check and brings back all the goodies.

                                    Splitting up this query made things worse. Here's the setup: Postgresql v7.3.4 (I believe), PHP v 4.3.8, zip distance calculated by C function built into Postgresql.

                                    The queries (note: there's a query to fetch lat/long values - this query is not included in the timing):

                                    Sub-select:

                                    $query = 'SELECT zipponame, zipclass, zipcode, ziplong, ziplat, stateabrev, statename, countyname, zip.zipcounty, geodist(' . $lat . ', ' . $long . ', ziplat, ziplong) AS distance FROM zip INNER JOIN state ON state.fipsstate = zip.zipstate INNER JOIN county ON county.fipsstate = zip.zipstate and county.fipscounty = zip.zipcounty WHERE zipcode IN (SELECT zipcode FROM zip WHERE geodist(' . $lat . ', ' . $long . ', ziplat, ziplong) <= ' . $get['radius'] . ') ORDER BY geodist(' . $lat . ', ' . $long . ', ziplat, ziplong)';
                                    

                                    Results: didn't return any data in an acceptable time frame.

                                    Fetch zip codes first:

                                    $ziplist = getZipCodeList($lat, $long, $get['radius']);
                                    $query = 'SELECT zipponame, zipclass, zipcode, ziplong, ziplat, stateabrev, statename, countyname, zip.zipcounty, geodist(' . $lat . ', ' . $long . ', ziplat, ziplong) AS distance FROM zip INNER JOIN state ON state.fipsstate = zip.zipstate INNER JOIN county ON county.fipsstate = zip.zipstate and county.fipscounty = zip.zipcounty WHERE zipcode IN (' . $ziplist . ') ORDER BY geodist(' . $lat . ', ' . $long . ', ziplat, ziplong)';
                                    

                                    This one worked. For 44137 and a 20 mile radius, it brought back the results in 9.5 seconds. Not good.

                                    Combine query 2 and 3 together:

                                    $query = 'SELECT zipponame, zipclass, zipcode, ziplong, ziplat, stateabrev, statename, countyname, zip.zipcounty, geodist(' . $lat . ', ' . $long . ', ziplat, ziplong) AS distance FROM zip INNER JOIN state ON state.fipsstate = zip.zipstate INNER JOIN county ON county.fipsstate = zip.zipstate and county.fipscounty = zip.zipcounty WHERE geodist(' . $lat . ', ' . $long . ', ziplat, ziplong) <= ' . $get['radius'] . ' ORDER BY geodist(' . $lat . ', ' . $long . ', ziplat, ziplong)';
                                    

                                    Time is about 0.6 seconds for 44137 and a 20 mile radius. This is the current query I use on my page (http://www.rsbauer.com/zip/). I've left the timer up in case people would like to play with it.

                                    Now do a zip of 44137 and 1000 miles and it takes about 17.5 seconds. Drawmack, you had mentioned for 1000 zips, your query takes about 20 seconds. So, the approach I'm using doesn't seem to be too bad, even though it IS checking the distance for each (42,000+) zip code.

                                    Interesting stuff...

                                    If people would like to see the C code (PostgreSQL flavor) for geodist(), let me know. I have it in C++ flavor for MySQL as well.

                                      Well I'm not sure why I said 20 seconds, but I put a timer on mine and here is the output from it:
                                      There are 25933 towns within 1000 miles of Gilbert, PA
                                      Zip codes found in 0.92205190658569 seconds.

                                      So that's 1000 mile radius, with the fixed code, in less then 1 second. Yours is over a factor of 17 slower then this.