10 Replies Latest reply on Mar 21, 2017 12:06 PM by sprynmd

    Find & Remove Duplicate records

    Suresh

      Dear All,

       

      How to find and remove duplicate records?

       

      Becasue My database contains some more no. of layouts and averagly 1K records each layout have.

       

      So how to find those and remove duplicates? Is there any easy way to find out & remove?

       

      Thanks,

      Suresh

        • 1. Re: Find & Remove Duplicate records
          electon

          You can find duplicates by typing ! in find mode in the field.

          How you define what is a duplicate and which one to leave is up to you as it will depend on the solution.

          • 2. Re: Find & Remove Duplicate records
            coherentkris

            I tend to use a self join on the relationships graph that matches equal to the fields you want to detect for duplicates and does not match (not equal symbol) on primary key.

            No matter what method you use you should thoroughly test to find out if their are any edge cases that would cause false positives or false negatives.

            For example find uses the field index and the index has limitations that may cause false results when the text field has more characters or words than the index has cataloged.

            Once your satisfied then script it.

            • 4. Re: Find & Remove Duplicate records
              ricardocpa

              The link provided by @siplus works like a charm, thank you for posting it in this thread.

               

              I see how the Counter fields help identify unique records from duplicate ones, but just out of curiosity: I don't understand how the field self-join::Counter behaves differently. Is it because through the relationship the duplicate records are not unique?

               

              • 5. Re: Find & Remove Duplicate records
                philmodjunk

                I find it simpler just to use import records to remove the duplicates. If you specify Unique Values, Validate Always, you can import the records into a clone (or just a copy of the current table) and the validation rule filters out the duplicates.

                1 of 1 people found this helpful
                • 6. Re: Find & Remove Duplicate records
                  sprynmd

                  A technique I'm using is a variation on the self-join approach.  I do this with existing fields and do not have to create any new or calculation fields as described in FileMaker's technique.  My technique takes advantage of the fact that every table I create has a unique primary key.  I haven't encountered any problems with this approach (so far).

                   

                  My table includes these fields:

                   

                  Planner

                  __Plan_PlannerID_pk

                  _Plan_PeopleID_fk

                  _Plan_ClassDefID_fk

                   

                  My primary key is a serial number.  (I'm not sure how well this would work if the key was a UUID.)  I then set up a self-join relationship.  Using less-than (<) means that only the smallest-value primary key will not show up in the self-join table:

                   

                  Dupe_relationship.tiff

                   

                  The "Go to Related Record" script step is set to match based on found records:

                   

                  Dupe_GTRR.tiff

                   

                  The script is pretty simple:

                  Dupe_script.tiff

                   

                  I found that if a record was being edited and not committed, depending on the scenario it might not be included in the related records.  Also, without the "Show All Records", you would only find the dupes of the found set.

                   

                  If there are no dupes, the found set is empty.

                   

                  I originally was using the sort/loop technique.  Admittedly my script could have been more efficient, but it was taking an hour to work through about 3000 records.  This approach cut that to 30 seconds.

                   

                  Critiques/suggestions are welcome.

                   

                  ...Mike

                  • 7. Re: Find & Remove Duplicate records
                    siplus

                    If you plan to have UUIDs as pk, you can also define a CreationUTC field autoentering Get(CurrentTimeUTCMilliseconds) and use that in the relationship with the "<" clause, instead of the pk.

                     

                    I would replace the If [ Get (FoundCount) > 0 ]  with if [ Get ( LastError) = 0 ] but that's just me.

                     

                    Notice that if you have records with empty values in both fields you use for "=" in the relationship, it won't work for them.

                    • 8. Re: Find & Remove Duplicate records
                      sprynmd

                      siplus:

                       

                      Thanks for your comments.

                       

                      I recognize your caution on the empty values.  In my case that's why I used the foreign keys because my scripts prevent those from being empty.  BUT, if I messed up, at least the records would not be matched, and therefore not be deleted.

                       

                      ...Mike

                      • 9. Re: Find & Remove Duplicate records
                        siplus

                        Get(LastError) after a unsuccessful GTRR raises a 101 Record is missing error, so Get(LastError) = 0 is correct.

                        • 10. Re: Find & Remove Duplicate records
                          sprynmd

                          You beat me to it!  I recognized my error right after I hit the return key and tried to remove it before you got to it.  You are absolutely right.

                          ...Mike S