14 Replies Latest reply on Jun 26, 2012 2:25 AM by Malcolm

    Zipcode Radius Search - Speed Issue


      I can use some help/advice on a simple zip radius solution. I have a list of zip codes and the ability to find ones within a certain mile radius. I'm using a custom function "DistanceBetweenPoints ( Lat1; Lon1; Lat2; Lon2; Units )" by Vaughan Bromfield, Vaughan Bromfield Design.


      The process works as expected. The problem is speed. Somewhere in the process a calculation must occur. I've tried having the calculation at the field level (both calculated field and auto-calc), I've tried inserting the calculation into a number field, etc. I've also experimented with Replace Field Contents, Looping a Set Field, using ExternalSQL, etc.


      Am I missing something very obvious? Is there a way to do this in FileMaker with better speed results? The potential client who is asking for this feature would like to use it on his web site. Currently, the way I have it set up, it is very slow on my local machine (iMac 2.66 GHz, Intel Core 2 Duo, 4 GB memory), so I cannot imagine hosting it and using CWP to access the data.


      By the way, what I have in mind is something like this.


      All thoughts/comments/suggestions welcome!

        • 1. Re: Zipcode Radius Search - Speed Issue

          There is a better way. It's called a geo-box. A box is easier to calculate than a circle.


          Get a central point lat and lon. Calculate the lat and lon points of the box around the central point. Use these four lat and lon pairs in a multi-predicate relationship to your zip codes. The relationship will select those zips within the box WITHOUT having to calculate the distance of each.


          If you realy need a circle, filter the portal displaying the results and use the distance calculation in the filter calc. You've reduced the calculation load from thousands to dozens.

          • 2. Re: Zipcode Radius Search - Speed Issue

            Thanks Vaughan.  You are right, it is a better way; much, much faster.  I'm using the geoBox() custom function that Steve Murray wrote.


            While it is much faster, it is not as accurate.  Searching for zip codes 30 miles out produces a result of zips up to 40 miles; 125 miles returns zips as far away as 174 miles.  As you mentioned, I could use portal filtering to hone the results.  However, my plan is to use this in CWP so I don't believe portal filtering will help there.


            Any thoughts on how to refine the results to be closer to the requested radius?



            • 3. Re: Zipcode Radius Search - Speed Issue

              If you have a fixed number of postal codes to use as the center point, create a table with the distance between postal codes precomputed. It may have hundreds of thousands, or millions of records, but any one is a very fast lookup. And you could combine it with the geobox CF, you should be able to get your set very quickly.


              My list of unique US ZIP Codes has about 50,000 records, so a limited number of center points yields 50,000 records per point, or about 2,500,000,000 records for all combinations. Large, but well within FileMaker's capability of 64,000,000,000,000,000 total records.


              Now, if you do Canadian postal codes, the total jumps to 640,000,000,000  records for all combinations, which is stll a fraction of what FileMaker can handle. Your hard drive is another story. ;-)

              • 4. Re: Zipcode Radius Search - Speed Issue

                I'm using the geoBox() custom function that Steve Murray wrote.


                While it is much faster, it is not as accurate. 


                Any thoughts on how refine the results to be closer to the requested radius?



                Working to only three decimal places may be too crude. Have you checked the size of the values against known data?


                Second, it's a box so anything that falls into one of the corners is going to be outside the defined radius.



                • 5. Re: Zipcode Radius Search - Speed Issue

                  Maybe you can refine your technique to have the best of both worlds.


                  Use the GeoBox() calculation to narrow down the possibilities. An alternative to setting up multi-predicate relationships is to do a search. Since this will be a web-facing app, they will need to submit a value/push a button to find their results so a FileMaker Find would work well into the flow. Search for records that have latitude and longitude within the ranges provided by GeoBox().


                  Then use the slower DistanceBetweenPoints() calculation to refine the results on that smaller set of records.

                  An alternative to portal filtering is to loop over the smaller set of records and omit the ones that fall outside of the distance calculated by DistanceBetweenPoints().



                  • 6. Re: Zipcode Radius Search - Speed Issue

                    Thanks Malcom.  Based on your suggestion, I'm now working with a Lat/Long set of six decimal places.  It took a bit of searching to get it and I'll make another post in this thread about that process.


                    Also, your comment about the box is well taken.  Thanks.

                    • 7. Re: Zipcode Radius Search - Speed Issue

                      Thanks for your suggestions, Tom.  I'm working through them and will post here when I have the results.

                      • 8. Re: Zipcode Radius Search - Speed Issue

                        Thanks for your comment, David.  I certainly like the fact that search on any one record would be "a very fast lookup."  And although FMP can handle the record set, I'm wondering how long it takes to populate the tables.  Have you done this?  How long did it take?

                        • 9. Re: Zipcode Radius Search - Speed Issue

                          Since I have a small number of fixed center points, I've never bothered to time the process, or to do anything to optimise it.


                          However, you piqued my curiosity.


                          I have about 47000 unique postal codes (Mostly ZIP Codes, but a few Canadian postal codes.)


                          Computing  the distances between a single zip and all 47000 others took 25 minutes over the LAN to a server-hosted database.


                          Moving the files to my Mac mini (2.3 GHz i5 w 8 GB) and running it again took a touch over 9 minutes.


                          Perfectly acceptable for a small number of center points (I have accumulated about 50 center points over the years - locations of our seminars), but about a year of computing for my mini to go through all 47000. I suspect that my methodology could be improved upon. ;-)

                          • 10. Re: Zipcode Radius Search - Speed Issue

                            Thanks for the follow-up, David.  I do like your system, especially the speed.  In my case, any one zip code could be a center point for any other zip code.  In my set I have 41,874 records.  Not only can those all be center points, but the system needs to be flexible on the radius.  Could be 1 mile or 200 miles or anywhere in between.  Whole numbers only, but no limited to 200.  So it seems like it would not be feasible in my case.  Still, I like hearing how you have approached it, so thanks for explaining your system.

                            • 11. Re: Zipcode Radius Search - Speed Issue

                              Thanks to all who have responded.  The replies were very helpful.  I went with Tom's system of doing a search and then refining that found set in lieu of a multi-predicate relationship (see attached).


                              Regarding the latitude and longitude data, Malcom suggested that the original decimal places (2) might not be robust enough.  Hoping to get more accurate latitude and longitude numbers, I did the following:


                              1. Downloaded latest official zip code data set for free (use smaller data set: "Primary location only [Just one "Primary" location for each Zip code]" ):



                              2. Downloaded the "Zip Code Tabulation Areas" from the US Census Bureau (http://www.census.gov/geo/www/gazetteer/gazetteer2010.html).  Unfortunately, this record set does not contain cities and states, so I imported this data using the zip code as a match field to the previous download (the "Primary location only" set from federalgovernmentzipcodes.us).


                              Although the "Zip Code Tabulation Areas" had a more refined latitude/longitude (6 decimal places), I soon discovered that not all of the zip codes had it.  In fact, the first record set ("Primary location only") has 41,874 records and the "Zip Code Tabulation Areas" has 33120.  So, 8,754 records must be calculated on the shorter zip code.


                              Note that in the attached file I did not put in logic to look for the longer zip code.  I just reverted back to the original set, using the zip code with 2 decimal places.  Alternatively, I could have updated the 33k records with the longer zip code and left the remaining 8.7k as the 2 decimal.  I did neither as it is only a demo.  I may choose to do the latter in production.


                              Finally, it does beg the question (at least for me), where does one get the zip codes?  I've seen sites where they can be purchased.  I've also seen sites where they can be downloaded for free.  But how dependable is that data.  Seems like the best source would be the US Post Office, though I didn't see a download option there.  The Census Bureau is another good place, but the data can be up to 10 years old.  To that end, zip codes, it seems, can change, so it seems that anyone using this type of technique would need to check/update the data regularly (e.g. yearly?).


                              As always, I'm interested in the thoughts/experiences of others.



                              • 12. Re: Zipcode Radius Search - Speed Issue





                                I’ve used Zip-Codes.com for about 5 years now, and always with great results.








                                I buy the “business” subscriptions for $139 per year, with updates delivered on the 1st of each calendar month. But even the $39 per year subscription includes longitude and latitude. (I’ve only used the US data, but they also have Canadian postal codes as well.) I wrote a script to import the updated zip code file into my database once a month, so the maintenance is pretty simple.




                                I use the data for checking city + state + zip combinations when users enter addresses, for presenting statistical data by zip, determining the local time (including daylight saving time) for every Contact in the database (on a real-time basis), and for calculating approximate distances between two zip codes, etc.




                                Bottom line: I highly recommend this source.




                                Of course using a WSDL-based web service would be another option, but that would probably cost a bit more.




                                Peace, love & brown rice,


                                Morgan Jones




                                FileMaker + Web:  Design, Develop & Deploy


                                Certifications: FileMaker 9, 10 & 11


                                One Part Harmony <http://www.onepartharmony.com/>  


                                Austin, Texas • USA



                                • 13. Re: Zipcode Radius Search - Speed Issue

                                  I second the vote for zip-codes.com.


                                  Also, I've been playing with my code, and gained almost an order of magnitude by precomputing the sine and cosine functions for the lat/long values.


                                  I can now generate the 47000 records in 98 seconds.

                                  • 14. Re: Zipcode Radius Search - Speed Issue

                                    Regarding the latitude and longitude data, Malcom suggested that the original decimal places (2) might not be robust enough. 


                                    Google maps API provides lat/lon data accurate to six decimal places.


                                    For ad-hoc queries I use a google map query in a web viewer and a script to scrape the the lat/lon from the response.


                                    For building data in large sets I use Google Refine. It is a great general tool for cleaning up data and it is useful for building data too. Given sufficient address data it will happily pull in json data from google maps which can then be parsed for lat/lon and other info. The only limit then is that Google itself has a fixed limit on the number of queries per day so if your data set is large you'll need to do it in batches.