No matter how you slice it, I think you'll need that human element to review and evaluate possible "near matches" to see if they are truly identical given how "messy" your data is.
I do suggest you create a related customer table. Instead of using the variable to assign customer ID's, I'd create records in this table and use an auto-entered serial number field in this table to assign the customer ID numbers. The script could use the data in your existing table to create records with customer name and other data specific to the customer while also assigning a unique customer ID number.
During the review process, fields with similar name/address data can be fairly easily combined to link to a single record in this new table with the help of some fairly simple scripting.
This won't speed up the process at all, but should leave you with a much better table structure when you are done that you can use from this point going forward to better manage your data.
Thanks for the reply Phil
Actually I am doing something quite similar. The script is now creating the Cust ID which will serve not only to identify probable unique candidates but also as the UID for the Customer Table.
In other words using the draft cust ID I will sort out all of the customer information and replicate that into the new customer table.
However, because my script is analyzing all records that have not yet been assigned (in the beginning 25 000) it takes almost 2 Minutes to complete the loop for one record (of 25 000 in Total). At that speed, assuming that duplicates will speed things up it could take up to 13 days to complete the cycle.
Any ideas how to speed that up?
Not without knowing a lot more about how your table is designed, the script is written and how the textSimilarity tool works.
Just found my solution, regrettably it has nothing to do with FM.
For all trying to eliminate duplicate records that can be similar and are not neccessarily exact duplicates the Solution would be FuzzyDupes!
It works extremely fast and gives you high quality output.