1 2 Previous Next 15 Replies Latest reply on Apr 7, 2015 12:08 AM by user19752

    Find duplicate Records

    petery009

      Hi All

       

      I don't what i did "Wrong" or "Difference" from before.

      I found that use "!" to find duplicate is not working properly.

      I will continue to look at my file and data and more test.

       

      But meanwhile i like to see if anybody else has the similar experience, please share with me .

       

      Thanks,

      peter

        • 1. Re: Find duplicate Records
          petery009

          Hi All

           

          This is basically what i try to achieve:

          i have a foundset of records , i need the unique value in field A.

          So i want to check if there's any duplicate in field A.

           

          Any other ways to do this rather than use "!" ?

           

          Thanks,

          peter

          • 2. Re: Find duplicate Records
            Mike_Mitchell

            To answer your question, you can do a self-joining relationship on the target field and see how many records appear.

             

            But perhaps you can explain what you mean by "not working properly". Exactly what are the symptoms you're seeing?

            • 3. Re: Find duplicate Records
              StephenWonfor

              Peter

               

              ! has worked for "eons" - have you sorted the data after the search so your dupes will cluster together?

               

              To expand on what Mike suggested:

               

              1.  make sure your data has an non-modifiable incrementing serialID for each record.

              2.  create a self join (unsorted) on the field you want to contain unique values.

              3.  create a calc that tests to see if the serialID for any record is equal to the serialID via the relationship.  When they match you have the "master" record, when they don't match you have a duplicate.

               

              Stephen

               

              "A lot of people like snow. I find it to be an unnecessary freezing of water." ~Carl Reiner

              • 4. Re: Find duplicate Records

                You could make a self join based on field A.

                Add a calculated field to count the related records.

                If more than one, you have duplicates.

                 

                FileMaker's helps shows how to write a calc to make label the first match as unique and the others as duplicates but you may not need that complexity.

                Do a find for greater than one and list and sort the duplicates if that is enough.

                 

                This might not find matches that have an extra space or end or don't end with a period, etc.

                • 5. Re: Find duplicate Records
                  petery009

                  Thanks!

                   

                  This is what happen to me:

                  i did a find on field B, and i found 30 out of 10,000 records.

                  and

                  i did a constrain find on field A, with "!"

                  it gives me the same foundset 30 out of 10,000.

                  and all the 30 records' field A are unique.

                  • 7. Re: Find duplicate Records
                    Mike_Mitchell

                    Can't duplicate.

                     

                    What kind of field is this? Text? Number? Date? Error messages?

                    • 8. Re: Find duplicate Records
                      petery009

                      Hi Mike

                       

                      It's a text field. No error message.

                      I shared the file that i had problem with below. ( i did a lot cleaning of the file, but the "problem" is still exist.)

                       

                      Dropbox - DMS_Data.zip

                       

                      Steps to duplicate:

                      1. Do a find on TDD_kfnAssemblySysId with value "545"

                      2. Do a constrain find on TDD_lxtPartNumber with "!"

                       

                      I think it's just the data corruption. but i don't know how i get that and is there a way to recover or what to do?

                       

                      Thanks,

                      peter

                      • 9. Re: Find duplicate Records
                        StephenWonfor

                        Peter

                         

                        Which field do you expect the duplicates in?

                         

                        Stephen

                         

                        -


                         

                        "The measure of a man's real character is what he would do if he knew he would never be found out." -Thomas Babington Macaulay

                        • 10. Re: Find duplicate Records
                          siplus

                          I defined a field "codeIt" as Code(TDD_kfnAssemblySysId) & "-" & Code(TDD_lxtPartNumber)

                           

                          I did a search on 545 in TDD then constrained ! on this field, and it found no duplicates.

                          • 11. Re: Find duplicate Records
                            petery009

                            Hi Stephen

                             

                            If you download the file successfully.

                             

                            Steps to duplicate:

                            1. Do a find on TDD_kfnAssemblySysId with value "545"

                            2. Do a constrain find on TDD_lxtPartNumber with "!"

                             

                            I want to find duplicate in TDD_lxtPartNumber within a foundset.

                             

                            Thanks.

                            • 12. Re: Find duplicate Records
                              electon

                              It does show some strangeness.

                              I think what is happening is that a lot of your records in PartNumber are duplicates.

                              Looks like it finds the duplicates first, then constrains by the SysId.

                              So the result you get : here are the records that are duplicates and are within the original found set. Which should be of course the other way around.

                              It could be that find duplicates behaves totally different than other criteria. Just because duplicates are mostly undesired in a data set from primary key view.

                              If you first look for 545 then omit records and delete found set, show all, then do a constrain on duplicates it will show "nothing found".

                              It may be an interesting bug report.

                              P.S. it's not data corruption. I exported, then imported the records into a new database and results are the same.

                              • 13. Re: Find duplicate Records
                                nicolai

                                I did not use the constrain found set like this before and to be honest, I would expect the same result as you.

                                 

                                My theory is that all the records you have in your found set have duplicates outside of your found set. These duplicates are excluded with constrain found set giving you the same set of records. By the way, you do not need constrain found set, you can use a single find request and set up both fields to the same values - this will give you the same result.

                                 

                                The work around is a self-join suggested by Mike_Mitchell ot local calculation using ExecuteSQL to get the ids of duplicated records.

                                 

                                I tried this as a calculation, it will return the duplicate record id or nothing if there is no duplicates:

                                 

                                ExecuteSQL( "
                                     SELECT TDD_systemId
                                     FROM TDD_ToolDetail
                                     WHERE TDD_kfnAssemblySysId = ?
                                     AND TDD_lxtPartNumber =?
                                     AND TDD_systemId !=?
                                " ; ""; "" ; 
                                TDD_ToolDetail::TDD_kfnAssemblySysId;
                                TDD_ToolDetail::TDD_lxtPartNumber;
                                TDD_ToolDetail::TDD_systemId)
                                
                                • 14. Re: Find duplicate Records

                                  Dowloaded your file and opened it in 12 Advanced.

                                   

                                  Use ! for a manual search

                                  TDD_kfn.... returned 6191 duplicated records

                                  Tdd_lxtPartNumber returend 11,283 duplicated record

                                   

                                  You have empty TDD fields , some TTD are duplicated 30 times, etc.

                                   

                                  In other words, this database is a mess.

                                   

                                  I created this calc field to make one long string to compare:

                                   

                                  GetAsText(TDD_kfnAssemblySysId)  & "|" & TDD_lxtToolType & "|" & TDD_lxtToolSize & "|" & TDD_lxtToolName & "|" & TDD_lxtPartNumber

                                   

                                  This produced 5067 duplicated records.

                                   

                                  This is probably a more accurate method since some of fields may be duplicated in one field but unique in another.

                                   

                                  On a DUPLICATE of your file, create a calc field with the above calc. Then go to the help file and search for:


                                  Finding duplicate values using a self-join relationship


                                  This will produce a field where the first matching record is labeled Unique and the rest Duplicate.


                                  Search for Duplicate and delete the found set.


                                  The SystemID number can be reserialized. Note that I did not include this in the calc above.


                                  Note that in this example NONE of the fields have Unique set in any of the fields.

                                  Also, the check box for allow user to overide any warning is set to OK.


                                  This is one reason why you have duplicates. Plus you may have imported records and Filemaker does not reject a record for a duplicate value.

                                  1 2 Previous Next