There is a better way. It's called a geo-box. A box is easier to calculate than a circle.
Get a central point lat and lon. Calculate the lat and lon points of the box around the central point. Use these four lat and lon pairs in a multi-predicate relationship to your zip codes. The relationship will select those zips within the box WITHOUT having to calculate the distance of each.
If you realy need a circle, filter the portal displaying the results and use the distance calculation in the filter calc. You've reduced the calculation load from thousands to dozens.
Thanks Vaughan. You are right, it is a better way; much, much faster. I'm using the geoBox() custom function that Steve Murray wrote.
While it is much faster, it is not as accurate. Searching for zip codes 30 miles out produces a result of zips up to 40 miles; 125 miles returns zips as far away as 174 miles. As you mentioned, I could use portal filtering to hone the results. However, my plan is to use this in CWP so I don't believe portal filtering will help there.
Any thoughts on how to refine the results to be closer to the requested radius?
If you have a fixed number of postal codes to use as the center point, create a table with the distance between postal codes precomputed. It may have hundreds of thousands, or millions of records, but any one is a very fast lookup. And you could combine it with the geobox CF, you should be able to get your set very quickly.
My list of unique US ZIP Codes has about 50,000 records, so a limited number of center points yields 50,000 records per point, or about 2,500,000,000 records for all combinations. Large, but well within FileMaker's capability of 64,000,000,000,000,000 total records.
Now, if you do Canadian postal codes, the total jumps to 640,000,000,000 records for all combinations, which is stll a fraction of what FileMaker can handle. Your hard drive is another story. ;-)
I'm using the geoBox() custom function that Steve Murray wrote.
While it is much faster, it is not as accurate.
Any thoughts on how refine the results to be closer to the requested radius?
Working to only three decimal places may be too crude. Have you checked the size of the values against known data?
Second, it's a box so anything that falls into one of the corners is going to be outside the defined radius.
Maybe you can refine your technique to have the best of both worlds.
Use the GeoBox() calculation to narrow down the possibilities. An alternative to setting up multi-predicate relationships is to do a search. Since this will be a web-facing app, they will need to submit a value/push a button to find their results so a FileMaker Find would work well into the flow. Search for records that have latitude and longitude within the ranges provided by GeoBox().
Then use the slower DistanceBetweenPoints() calculation to refine the results on that smaller set of records.
An alternative to portal filtering is to loop over the smaller set of records and omit the ones that fall outside of the distance calculated by DistanceBetweenPoints().
Thanks Malcom. Based on your suggestion, I'm now working with a Lat/Long set of six decimal places. It took a bit of searching to get it and I'll make another post in this thread about that process.
Also, your comment about the box is well taken. Thanks.
Thanks for your suggestions, Tom. I'm working through them and will post here when I have the results.
Thanks for your comment, David. I certainly like the fact that search on any one record would be "a very fast lookup." And although FMP can handle the record set, I'm wondering how long it takes to populate the tables. Have you done this? How long did it take?
Since I have a small number of fixed center points, I've never bothered to time the process, or to do anything to optimise it.
However, you piqued my curiosity.
I have about 47000 unique postal codes (Mostly ZIP Codes, but a few Canadian postal codes.)
Computing the distances between a single zip and all 47000 others took 25 minutes over the LAN to a server-hosted database.
Moving the files to my Mac mini (2.3 GHz i5 w 8 GB) and running it again took a touch over 9 minutes.
Perfectly acceptable for a small number of center points (I have accumulated about 50 center points over the years - locations of our seminars), but about a year of computing for my mini to go through all 47000. I suspect that my methodology could be improved upon. ;-)
Thanks for the follow-up, David. I do like your system, especially the speed. In my case, any one zip code could be a center point for any other zip code. In my set I have 41,874 records. Not only can those all be center points, but the system needs to be flexible on the radius. Could be 1 mile or 200 miles or anywhere in between. Whole numbers only, but no limited to 200. So it seems like it would not be feasible in my case. Still, I like hearing how you have approached it, so thanks for explaining your system.
Thanks to all who have responded. The replies were very helpful. I went with Tom's system of doing a search and then refining that found set in lieu of a multi-predicate relationship (see attached).
Regarding the latitude and longitude data, Malcom suggested that the original decimal places (2) might not be robust enough. Hoping to get more accurate latitude and longitude numbers, I did the following:
1. Downloaded latest official zip code data set for free (use smaller data set: "Primary location only [Just one "Primary" location for each Zip code]" ):
2. Downloaded the "Zip Code Tabulation Areas" from the US Census Bureau (http://www.census.gov/geo/www/gazetteer/gazetteer2010.html). Unfortunately, this record set does not contain cities and states, so I imported this data using the zip code as a match field to the previous download (the "Primary location only" set from federalgovernmentzipcodes.us).
Although the "Zip Code Tabulation Areas" had a more refined latitude/longitude (6 decimal places), I soon discovered that not all of the zip codes had it. In fact, the first record set ("Primary location only") has 41,874 records and the "Zip Code Tabulation Areas" has 33120. So, 8,754 records must be calculated on the shorter zip code.
Note that in the attached file I did not put in logic to look for the longer zip code. I just reverted back to the original set, using the zip code with 2 decimal places. Alternatively, I could have updated the 33k records with the longer zip code and left the remaining 8.7k as the 2 decimal. I did neither as it is only a demo. I may choose to do the latter in production.
Finally, it does beg the question (at least for me), where does one get the zip codes? I've seen sites where they can be purchased. I've also seen sites where they can be downloaded for free. But how dependable is that data. Seems like the best source would be the US Post Office, though I didn't see a download option there. The Census Bureau is another good place, but the data can be up to 10 years old. To that end, zip codes, it seems, can change, so it seems that anyone using this type of technique would need to check/update the data regularly (e.g. yearly?).
As always, I'm interested in the thoughts/experiences of others.
zipRadius2.fmp12.zip 2.4 MB
I’ve used Zip-Codes.com for about 5 years now, and always with great results.
I buy the “business” subscriptions for $139 per year, with updates delivered on the 1st of each calendar month. But even the $39 per year subscription includes longitude and latitude. (I’ve only used the US data, but they also have Canadian postal codes as well.) I wrote a script to import the updated zip code file into my database once a month, so the maintenance is pretty simple.
I use the data for checking city + state + zip combinations when users enter addresses, for presenting statistical data by zip, determining the local time (including daylight saving time) for every Contact in the database (on a real-time basis), and for calculating approximate distances between two zip codes, etc.
Bottom line: I highly recommend this source.
Of course using a WSDL-based web service would be another option, but that would probably cost a bit more.
Peace, love & brown rice,
FileMaker + Web: Design, Develop & Deploy
Certifications: FileMaker 9, 10 & 11
One Part Harmony <http://www.onepartharmony.com/>
Austin, Texas • USA
I second the vote for zip-codes.com.
Also, I've been playing with my code, and gained almost an order of magnitude by precomputing the sine and cosine functions for the lat/long values.
I can now generate the 47000 records in 98 seconds.
Regarding the latitude and longitude data, Malcom suggested that the original decimal places (2) might not be robust enough.
Google maps API provides lat/lon data accurate to six decimal places.
For ad-hoc queries I use a google map query in a web viewer and a script to scrape the the lat/lon from the response.
For building data in large sets I use Google Refine. It is a great general tool for cleaning up data and it is useful for building data too. Given sufficient address data it will happily pull in json data from google maps which can then be parsed for lat/lon and other info. The only limit then is that Google itself has a fixed limit on the number of queries per day so if your data set is large you'll need to do it in batches.