You can find duplicates by typing ! in find mode in the field.
How you define what is a duplicate and which one to leave is up to you as it will depend on the solution.
I tend to use a self join on the relationships graph that matches equal to the fields you want to detect for duplicates and does not match (not equal symbol) on primary key.
No matter what method you use you should thoroughly test to find out if their are any edge cases that would cause false positives or false negatives.
For example find uses the field index and the index has limitations that may cause false results when the text field has more characters or words than the index has cataloged.
Once your satisfied then script it.
The link provided by @siplus works like a charm, thank you for posting it in this thread.
I see how the Counter fields help identify unique records from duplicate ones, but just out of curiosity: I don't understand how the field self-join::Counter behaves differently. Is it because through the relationship the duplicate records are not unique?
1 of 1 people found this helpful
I find it simpler just to use import records to remove the duplicates. If you specify Unique Values, Validate Always, you can import the records into a clone (or just a copy of the current table) and the validation rule filters out the duplicates.
A technique I'm using is a variation on the self-join approach. I do this with existing fields and do not have to create any new or calculation fields as described in FileMaker's technique. My technique takes advantage of the fact that every table I create has a unique primary key. I haven't encountered any problems with this approach (so far).
My table includes these fields:
My primary key is a serial number. (I'm not sure how well this would work if the key was a UUID.) I then set up a self-join relationship. Using less-than (<) means that only the smallest-value primary key will not show up in the self-join table:
The "Go to Related Record" script step is set to match based on found records:
The script is pretty simple:
I found that if a record was being edited and not committed, depending on the scenario it might not be included in the related records. Also, without the "Show All Records", you would only find the dupes of the found set.
If there are no dupes, the found set is empty.
I originally was using the sort/loop technique. Admittedly my script could have been more efficient, but it was taking an hour to work through about 3000 records. This approach cut that to 30 seconds.
Critiques/suggestions are welcome.
If you plan to have UUIDs as pk, you can also define a CreationUTC field autoentering Get(CurrentTimeUTCMilliseconds) and use that in the relationship with the "<" clause, instead of the pk.
I would replace the If [ Get (FoundCount) > 0 ] with if [ Get ( LastError) = 0 ] but that's just me.
Notice that if you have records with empty values in both fields you use for "=" in the relationship, it won't work for them.
Thanks for your comments.
I recognize your caution on the empty values. In my case that's why I used the foreign keys because my scripts prevent those from being empty. BUT, if I messed up, at least the records would not be matched, and therefore not be deleted.
Get(LastError) after a unsuccessful GTRR raises a 101 Record is missing error, so Get(LastError) = 0 is correct.
You beat me to it! I recognized my error right after I hit the return key and tried to remove it before you got to it. You are absolutely right.