5 Replies Latest reply on Mar 20, 2009 12:47 PM by comment_1

    Identifying Duplicates in Filemaker Pro 9

    rowing190

      Title

      Identifying Duplicates in Filemaker Pro 9

      Post

      I have set up a process to identify duplicates in our Filemaker Pro database based upon a self-join of our contact table to itself and the creation of a counter field and a check duplicates field. In addition, I have created several calculation fields that identify the first three numbers of the zip and the first three letters of the first name. In addition to the last name field, these two fields are used in the self-join as the matching fields. The use of the calculation fields allows us to identify Don Fields 60601 and Donald Fields 60601 as duplicates.

       

      I am still relatively inexperienced at Filemaker Pro and scripting. My question is whether it is possible to write a script that will flag both the unique and the duplicate record so that those records can be exported into Excel and examined for the accuracy of the duplicate matching process. 

       

      Also, my intuition is that in the duplicate identification process described above, the first record in the database is designated as unique; subsequent records are identified as duplicates if they match the duplicate criteria in the self-join. Is that correct? If so, is there any other way to control which record is designated as unique and which is designated as the duplicate? For instance, an update field of one record may have of a date of 12/02/08; the update field of a second record may have a date of 03/15/09. I would want to keep the second record as its data has been updated recently. Is the only control over which record is identified as unique that of sorting the records beforehand according - in my example - to the update field - or any other field that one wishes to choose.

        • 1. Re: Identifying Duplicates in Filemaker Pro 9
          comment_1
            

          rowing190 wrote:

          my intuition is that in the duplicate identification process described above, the first record in the database is designated as unique; subsequent records are identified as duplicates if they match the duplicate criteria in the self-join. Is that correct?


          Not really. It's entirely up to you which one of the duplicates you consider to be "correct". Filemaker has no opinion to offer - they're all the same, as far as it can see.

           


          rowing190 wrote:

          My question is whether it is possible to write a script that will flag both the unique and the duplicate record so that those records can be exported into Excel and examined for the accuracy of the duplicate matching process. 


          You can perform a find using the ! operator to find only duplicate records (in this context, this means ALL records that have duplicates - IOW, the records that are NOT found have unique values).

           



          • 2. Re: Identifying Duplicates in Filemaker Pro 9
            rowing190
              

            ________________________________________________________________________________________________________

            Not really. It's entirely up to you which one of the duplicates you consider to be "correct". Filemaker has no opinion to offer - they're all the same, as far as it can see.

            ________________________________________________________________________________________________________

             

            That is not entirely true. Or perhaps we are talking about two different things.

             

            The process used for identifying duplicates described above is based upon a self-join relationship and upon a set of matching fields in the self-join. A counter field and a check duplicate field are also created. The check duplicate field employs the calculation - If(Counter = Contact 2::Counter; "Unique"; "Duplicate"). To run the process, enter your cursor in the counter field and then run replace contents on this field. Whenever duplicate records are identified based upon the matching criteria in the self-join relationship, values of unique and duplicate are automatically assigned to the check duplicate field in the records. I have no control over which record is identified as unique and which as duplicate in the actual running of this process. The logic governing which record is assigned unique and which duplicate seems to be that of the record's place in the data. The first record is designated unique and the second, third, fourth, etc. are designated as duplicates in the check duplicate field. At last that seems to be the case. 

             

            The difficulty with the process above is that I have no way of actually comparing the record defined as unque and the record defined as duplicate to see whether or not they are actually duplicates. I can do it in a piecemeal fashion. Find all of the records with a value of duplicate in the check duplicate field. Then go one by one through the group of duplicate records and determine if they really are a duplicate by performing a search based upon the name of the duplicate. That is why I was wondering if there was any way to flag both the unique and the duplicate record and export them into Excel. It would me much easier to check the accuracy of the process.


            • 3. Re: Identifying Duplicates in Filemaker Pro 9
              comment_1
                

              I am sorry, but you are confusing me with the way you use the term "unique". Among the duplicates, there is no unique record - they're all duplicates of each other. I understand you want to identify one of them as being "the one" (i.e. the one you want to keep).

              Now here's the thing: you have a relationship, and a calculation that references a related field (Contact 2::Counter). A reference to a related field always returns the value from THE FIRST RELATED RECORD, IN THE SORT ORDER OF THE RELATIONSHIP. If you have not defined a sort order for the relationship, then the default order is used - that is the order in which records were created.

              Therefore, the way you have it now, the oldest record (in order of creation) among the duplicates is being flagged as "the one". If you go into the definition of the relationship and sort the related records (on the Contact 2 side) by the date field, descending, then it will flag the most recent record.

               


              rowing190 wrote:
              The difficulty with the process above is that I have no way of actually comparing the record defined as unque and the record defined as duplicate to see whether or not they are actually duplicates.

              I don't see why not. If you find all the duplicates (using ! as mentioned before) and sort them by the concatenating field, you should be able to go over them in Filemaker just like you would in Excel.

               




              • 4. Re: Identifying Duplicates in Filemaker Pro 9
                rowing190
                  

                Thank you for taking the time to respond to my post and for clarifying that the first related record in the sort order of the relationship is the record that receives the value of "unique" in the check duplicate field; subsequent duplicate records will receive the value of "duplicate." That was my intuition, as I stated in my first post.

                 

                Ideally, I would like the process to insert a common value into the check duplicate field. That is, instead of inserting the values of "unique" and "duplicate" into the check duplicate field, a common numerical value is inserted. For example, the value of 001 would be inserted into the check duplicate field for the first set of duplicate records, 002 for the second set of duplicate records, etc. This would allow for a very easy way of checking the records to determine if in fact they are duplicates or not. Unfortunately, I am not trained in the creating of calculations or scripts. I have failed so far at my attempts to write such a calculation. If you know how to do so, that would be great - that is, if it is possible in Filemaker Pro.

                 

                Thanks,

                 

                rowing190 

                • 5. Re: Identifying Duplicates in Filemaker Pro 9
                  comment_1
                    

                  Yes, it's certainly possible, in more than one way, actually. If you have a unique SerialID field identifying the records (and you should always have such field), you could simply define a calculation field as =

                   

                  Contact 2::SerialID

                   

                  This will return the same value for the entire group of records. They won't be consecutive, but they should serve your purpose, I think.

                   

                   

                  You could also use conditional formatting to highlight the entire group to which the current record belongs. Or you could use a portal to show them in isolation - in short, there's no lack of options.