7 Replies Latest reply on Apr 9, 2015 10:32 PM by marshdom

    Another twist on finding duplicates

    marshdom

      Hi, I'm a novice user trying to do something a bit above my skill level. I have a database and I want to find and flag (or create a new table containing) one unique example from many similar records, based on the content of 3 or 4 fields.

       

      The data is a monthly airline schedule. Each flight has a number. At the moment the unique identifier is a Timestamp field, but the same event happens again and again throughout the month, sometimes with differing start times. It is these different 'variants' of each flight number which I need to identify and work on as a subset.

      Flagged records.png

      This screenshot shows what I'm trying to achieve. The Flt_Num records where I've coloured fields are 'unique'. Although flight 221 happens many times in the month there are 2 unique variants of it based on comparing the 'CI_Loc_Time' and 'CO_Loc_Time' fields.

       

      Likewise flight 229 takes place on most days but there are 2 variants of this also, when comparing 'Route', 'CI_Loc_Time', 'CO_Loc_Time' and 'Block'.

       

      I need a dataset or table containing just these unique Flight Number variations (around 3000 records, usally about 120 unique flight variations).

       

      Any help in how to approach this task would be really appreciated!

        • 1. Re: Another twist on finding duplicates
          rrrichie

          Different approaches to doing this...

           

          One is with SQL and a GROUP BY.  I don't if you want a count in the table set, but an option would be

           

          ExecuteSQL ( "SELECT Flt_num, count ( Flt_num ), Route, CI_Loc_Time, CO_Loc_Time, Block FROM Data_Import GROUP BY Flt_num, Route, CI_Loc_Time, CO_Loc_Time, Block " ; "" ; "" )

           

          You can determine the "distinctness" or uniqueness by removing or adding columns/fields to the GROUP BY

           

          In the old days (before FileMaker used SQL) we used to make a calculated field with the fields concatenated and then do a summary on that.

           

          Depending on what you want to do with the data, other options might be better...

          • 2. Re: Another twist on finding duplicates
            Mike_Mitchell

            rrritchie has given you two good options. Another would be a self-joining relationship that uses all the match fields as predicates. IOW, use Flt_num, Route, CI_Loc_Time, CO_Loc_Time, and Block all as match fields. Then, you can just see how many records exist on the other side of the relationship. If it's more than one, you have a duplicate.

            • 3. Re: Another twist on finding duplicates
              siplus

              If you need to select only unique records based upon your criteria and work with them, the easiest method is the old one that rrrichie mentions: create a composite key in a calculation field, build a self relationship on it, create a zz_I_am_dup_record field calculated as pk ≠ relationship ::pk and find all records that have 0 in this field. They are your found set and you can do whatever you wish with them.

              • 4. Re: Another twist on finding duplicates
                erolst

                siplus wrote:

                create a composite key in a calculation field, build a self relationship on it, create a zz_I_am_dup_record field calculated as pk ≠ relationship ::pk and find all records that have 0 in this field.

                I don't think that's working, since you're only looking at the first related record.

                 

                You could throw in a IsEmpty ( FilterValues ( List ( relationship ::pk ) ; pk ) ), but I think an easier way is to use a relationship that encodes your calculation expression, then check for related records …

                 

                cFingerPrint = selfCheck::cFingerPrint

                primaryKeyselfCheck::primaryKey


                matches only on same fingerprint in different records

                 

                A calc field cIsUnique: IsEmpty ( selfCheck::primaryKey )

                 

                evaluates to True if the record's fingerprint is unique (since even a single match would make it a duplicate).

                 

                Search that field for 1.

                • 5. Re: Another twist on finding duplicates
                  siplus

                  erolst wrote:

                  I don't think that's working, since you're only looking at the first related record.

                   

                   

                  Then think different !

                   

                  It does work, see attach.

                  • 6. Re: Another twist on finding duplicates
                    erolst

                    1. Your approach finds a single instance for each group of multiples – which, on re-reading, seems to be what the OP wanted … So sorry for muddying the waters.


                    My approach will find records that are “truly” unique, which I misread as the required functionality.

                     

                    siplus wrote:

                    Then think different !

                     

                    2. Me seems that the word you wanted to write in bold is “different”, not “think” …

                    • 7. Re: Another twist on finding duplicates
                      marshdom

                      Thank you all for taking the time to help me out. I'm in the process of trying the solutions you've offered, (so far the 'composite key' and summary works, it's into SQL and GROUP BY next). I was staring at the screen and now I'm motoring ahead! Have a great day. Dom