6 Replies Latest reply on Jan 29, 2014 9:15 AM by ScottFishkind

    Deleting Duplicates with Different Criteria



      Deleting Duplicates with Different Criteria


           Hi FM Pro users

           I am very new to FM Pro but have now spent the last 18 hours hooked... with very little sleep.

           I have multipal lists both in .xls and .csv with 1000's of contact records that I would like to consolidate into one list. These contact lists contain many duplicates of which have varying information. For instance one list might have John Doe with Company name and email but no address or phone number. Another list might have John Doe with only address and phone number. 

           When I import a new list of contacts into my master contact list I was able to write the script for DeDuping based on first and last name criteria replacing the Mark field with "X" and it works great (see script screen attachment).

           But as I continue to import more lists I need to be able to delete only the duplicates with the least amount of additional information (such as addresses or phone numbers) or that might contain the additional data I already have. But if one file is missing a phone number I need to then import it into the contact.

           Any ideas or suggestions on how to do this would be extremely helpful. 

           Thank you!




        • 1. Re: Deleting Duplicates with Different Criteria

               But if an address field for one record has "234 Main St" and the address field of its duplicate is "2345 Main St." how will you (or a script) know which is the one to keep and which one is outdated or an error? I just had to throw that one out here. Merging "messy" data of this sort is a major headache and you may not be able to fully avoid involving human judgement in the merge process.

               And you can very easily get multiple contacts with the same exact first and last names but for completely different people. And it can be impossible to determine that this is the case just from the data. Even identical addresses may not identify two contacts as duplicates as people with the same first and last name may in fact have the same address. ("Oh you mean John Smith SENIOR!, He also lives at that address as John Smith JUNIOR is his son...")

               Can you tell that I've had this headache myself? wink I once set up a database that managed a volunteer community service day with over 4,000 volunteers for my church. The organizers had it set up so that it was easy, even likely that more than one person would submit the same information on the same volunteer in order to register them. It took a lot of hours and a lot of emails/phone calls to eliminate duplicates and by the day of the event, we still had one or two pairs of records where we weren't sure if they were or were not for different people.

               But now that I've possibly ruined your whole day, take a look at the GetNthRecord function. You can use Get ( RecordNumber ) to capture the record number of the first record of a possible set of duplicates in a variable at the same time that you set $id to a value. You can then use GetNthRecord with such a variable to access the value of different fields in that "first" record to compare to fields in the current record that your script has identified as a duplicate. Your script could then move data from one duplicate record to another in order to merge data.

          • 2. Re: Deleting Duplicates with Different Criteria

                 Thank you PhilModJunk. I see it's going to require some creative work arounds. Thank you for the suggestion. I am sure it will come in handy. Btw, the lists that I'm working with doesn't have inconsistent duplicate information it's just that some of them are incomplete so I need to piecemeal contact info as they are ported in. Is there a way to merge data on import so that when I import a list with corresponding duplicate names it will prompt me to merge info or choose which one I want to import? That would be a very helpful function to help consolidate my lists. Thanks again for your help!

            • 3. Re: Deleting Duplicates with Different Criteria

                   The additional info would seem to still leave the door open for the possibility of getting two individuals with identical names.

                   But you could import data into a second table and link them by relationship to each other. You could then find all records where there is a matching record in the main table if you need to examine such before merging data. Then you could set up a looping script that selectively copies the data into your main table from each record. If this is a case where you are only copying over data if the field is blank in your main table, this could be a pretty simple script by setting up fields in your main table to use this relationship to "look up" data from the related table. But I still see issues with how you might match up your records to be imported.

              • 4. Re: Deleting Duplicates with Different Criteria

                     I don't know if I mentioned this but one thing that is consistent on all of these lists in the first and last name and I am do all duplicate searches by these criteria. I think your suggestion is also a helpful one for matching and merging records. One problem I'm running into that's slowing me down is sometimes there is a blank space after the name and it's not finding the duplicate. I saw there is a Trim formula for removing spaces http://help.filemaker.com/app/answers/detail/a_id/5140 but I've been trying to figure out where to out the formula or how to run it and I am not having any success. Can you give me some insight into where I can find this? The only thing I found was for Pro (advanced) http://www.filemaker.com/11help/html/fmpa_custom.23.5.html


                • 5. Re: Deleting Duplicates with Different Criteria

                       Trim ( Self ) could be defined as an auto-enter calculation with the "do not replace existing value..." check box cleared. If you enable auto-enter options during import, this will trim trailing spaces from the field during import.

                       Replace field contents with Trim ( FieldNameHere ) as the calculation option can do this for a found set of records--and that found set can be every record in the table if you do a Show All Records.

                  • 6. Re: Deleting Duplicates with Different Criteria

                         Thanks Phil but that one went over my head. I'll have to dig a bit more when I get some time.