8 Replies Latest reply on Jun 5, 2012 3:04 PM by philmodjunk

    script to find duplicate entries in a field

    KBGF75

      Title

      script to find duplicate entries in a field

      Post

       I want the capability to use a script to find duplicate entries in a given field of a found set of records. The file is FMP9. My initial focus is find records with duplicate email addresses. I want the script to work if one or more addresses has been duplicated.

      I found a Filemaker web page containing a sample script that seemed on target. A copy is attached ("Scripting example"). I created my own script, trying to follow the example. A copy of my script is attached ("script 20"). This script is shown twice: As printed from Scriptmaker and as shown on the monitor in edit mode. My relevant field names are "Mark" and "GlobalDupe."  Only the latter is global.

      My script runs, but erratically. Sometimes it seems to find a set of records in which the field of interest ("email_1") is empty. Sometimes it seems to move through all records (~4,000) without end, until I "escape."

      Please critique.

           -Al

        • 1. Re: script to find duplicate entries in a field
          philmodjunk

          Only image files of type jpg, gif or png can be uploaded with your post. If you have files of other types that cannot be uploaded, upload them to a file sharing site such as drop box and post the download link here.

          To post a script to the forum:

          1. You can upload a screen shot of your script by using the Upload an Image controls located just below Post A Answer.
          2. You can print a script to a PDF, open the PDF and then select and copy the script as text from the opened PDF to your clipboard for pasting here.
          3. If You have FileMaker advanced, you can generate a database design report and copy the script as text from there.
          4. If you paste a text form of the script, you can use the Script Pretty box in the Known Bugs List database to paste a version that is single spaced and indented for a more professional and easier to read format. (Use the HTML option on the database tab panel and paste the text into the forum's HTML editor.)
          • 2. Re: script to find duplicate entries in a field
            Sorbsbuster

            Would also help if you could clarify what you want to do with the duplicate records.  For example, do you want to show all the duplicates as a found set?  Do you want to find all records that are duplicated, and then filter them to only leave the 'extra' ones, in preparation for deletion, say?  If two records have an empty e-mail field, are they 'duplicates'?

            • 3. Re: script to find duplicate entries in a field
              KBGF75

              I will paste below the two files that did not upload. Sorry not to give you a more readable document.

              Regarding what I want to do with the found duplicate records, I don't want any automatic deletions. Some duplicates may have been caused by a data entry error. I just want to examine the duplicate records and make case-basis decisions about each. 

                      -Al                                                  

                                                 Scripting Example

               ln this example, a travel agency uses a registration database to track clients who sign up for a cruise Seminar.FP7. The database contains duplicate records because some clients called the travel agency to sign up and also mailed in a registration form for the same Seminar.FP7

              The following sections explain how to set up a database and define a script to find duplicate records.

               For this example, you should understand fields with Global storage and know how to define fields.

               CAUTION: You can't undo the action of deleting records. Use of this script to delete duplicate records should be run on a copy of your database. Always keep a backup of your database before performing any scripts that include steps for deletion of records.

               1. Create a unique value for each record

              The Seminar.FP7 file contains records of clients who register for a seminar. Each client has a unique identification number, the Client lD field. Some records contain the same identification number - these are the duplicate records you want to find.

               TIP: lf your database doesn't have a unique identification field, define a calculation field to create one from existing fields. For example, combine first name, last name, and birthday create a unique identification for each client.

               2. Add two fields

                    Define these fields in the Seminar.FP7 file:

                    -A text field, called Mark. When the script finds a duplicate record, it places an X in this field to mark the record.

                     -A *Text field with Global storage, called Global, to store the unique identification while comparing records. *Global should be the same data type as the Client lD field.

               3. Display the Mark field

                  Create a layout that displays the Mark field, or add the field to an existing layout. You'll use this layout to store Find setting in the following step.

               4. Define the script

                  ln Manage Scripts (previously known as ScriptMaker), define the Find Duplicates script:

               Show All Records

              Sort Records [Restore, No Dialog] Sort by Client ID in Ascending oder

              Go to Record/Request/Page [First]

              Replace Field Contents [No dialog,'Mark', " "]

              Loop

              Set Field [‘Global','ClientlD']

              Go to Record/Request/Page [Next, Exit after last]

              lf [‘Global = ClientlD'l

              Set Field ['Mark', "X"]

              Else

              Set Field ['Global','ClientlD']

              End lf

              End Loop

              Perform Find [Restore] Find records when Mark = X

               When you perform the Find Duplicates script:

                 1. lt finds all records, and then sorts them by the ClientlD field so that records with the same Client lD are grouped together.

                 2. Starting with the first record, the script copies the value from the ClientlD field into the Globalfield.

                 3. The script goes to the next record and compares the value in ClientlD with the value in Global.

                       -lf the values match, the record is a duplicate the script puts an X in the Mark field .

                      - lf the values don't match, FileMaker Pro copies the ClientlD value into Global. lt doesn't change the Mark field.

                 4. The script repeats step 3 until it reaches the last record in the file.

                 5. The script finds all records with an X in the Mark field, and then displays the found set of the duplicate records.

               -------------------------------------------------------------------------------------

              MY SCRIPT, TRANSCRIBED  AS PRINTED FROM SCRIPTMAKER

              Show All Records

              Sort Records [Specified Sort Order: Contacts::email_1; ascending]

                   [Restore; No dialog]

              Go to Record/Request/Page

                   [First]

              Replace Field Contents [Contacts::Mark; Replace with calculation: “ “]

                   [No dialog]

              Loop

                  Set Field [Contacts::GlobalDupe; Contacts::email_1]

                  Go to Record/Request/Page

                           [Next; Exit after last]

                  If [Contacts::GlobalDupe = Contacts::email_1]

                           Set Field [Contacts::Mark;  “X”]

                  Else

                           Set Field [Contacts::GlobalDupe; Contacts::email_1]

                  End if

              End Loop

              Perform Find [Specified Find Requests: Find Records; Criteria: Contacts::Mark: “X”]

                   [Restore]

              ------------------------------------------------------------------------------------------------------------

              MY SCRIPT, TRANSCRIBED AS SHOWN ON MONITOR

              Show All Records

              Sort Records [Restore; No dialog]

              Go to Record/Request/Page  [First]

              Replace Field Contents [No dialog; Contacts::Mark;  “ “]

              Loop

                  Set Field [Contacts::GlobalDupe; Contacts::email_1]

                  Go to Record/Request/Page [Next; Exit after last]

                  If [Contacts::GlobalDupe = Contacts::email_1]

                           Set Field [Contacts::Mark;  “X”]

                  Else

                           Set Field [Contacts::GlobalDupe; Contacts::email_1]

                  End if

              End Loop

              Perform Find [Restore]

               

              • 4. Re: script to find duplicate entries in a field
                philmodjunk

                Your script looks to be written correctly.

                What data type is the email_1 field? It might fail if it is of type number.

                What layout are you on when you perform this script? It should be a layout that lists "Contacts" in "Show Records From" in Layout setup.

                • 5. Re: script to find duplicate entries in a field
                  KBGF75

                  The email_1 field is data type 'text.'  I am performing the script from a Contacts layout.

                  After reading the last post from PhilModJunk I ran the script again. Unlike earlier tests, I didn't impatiently stop (with 'escape') in the middle of the several minute process. When the script stopped by itself, the found set consisted of 1587 records. In 1581 of these, the email_1 field was empty. In the other 7 the email_1 field was duplicated in a different record, outside the found set.

                  It looks like the script is basically doing its job, with a caveat: Why are the 1581 records with an empty field in the found set? By searching separately for all records with an empty email_1 field, I've confirmed that the file contains only 1581 such records, so the script got them all.

                  Another note: The script is supposed to place a "X" in a record's Mark field when it finds a dupe. In all 7 found records of that sort, the Mark field looks empty. Maybe this is because the script resets that field to finish the script? I wonder, since it clears the Mark field at the start of the script.

                  Finally, the above report is based on a single test. I will do more of these, and later I want to adapt the script to search for dupes in other fields.

                          -Al

                  • 6. Re: script to find duplicate entries in a field
                    philmodjunk

                    It looks like the script is basically doing its job, with a caveat: Why are the 1581 records with an empty field in the found set?

                    Take another look at your if step in the script. Since you have two or more fields with empty email address fields, your script will interpret that as a duplicate value. (empty field in global = empy field in current record is a true result here.)

                    The script is supposed to place a "X" in a record's Mark field when it finds a dupe.

                    As written, the script only marks second and subsequent duplicates. If you have three records with "MyName@domain.com" in the email record, only the 2nd and third records will be marked.

                    You can perform a find before running this script that omits all records where the email address field is empty (put an = and nothing else in the email field). You can also find only for duplicates by putting an ! in this email adress field to reduce the records your script loops through.

                    • 7. Re: script to find duplicate entries in a field
                      Sorbsbuster

                      @KBGF75 - "You can also find only for duplicates by putting an ! in this email address field" - I asked my questions about 'what the script needs to finally perform' because this is how duplicates are normally found.

                      • 8. Re: script to find duplicate entries in a field
                        philmodjunk

                        I suspect that this KB article may predate the use of ! but it's also more generally applicable to all possible searches for duplicates. There are reasons for not using such a find in some searches for duplicate values, but they don't appear to apply here.

                        This method will fail if you use ! in a field filled with a return separated list of values. Just as such a list works like an "OR" when matching values in a relationship, ! will find any records where any one one value in the list also exists as a value in the list of another record.

                        Example:

                        If a field named FruitList in one record has: Apple <return> Orange and FruitList in another record has Orange<Return>Kiwi. Performing a Find with ! will find these two records as they both have a duplicate value (orange) in their list of values.