5 Replies Latest reply on Jul 17, 2012 8:51 AM by philmodjunk

    Database clean-up - multiple records for a single person

    TreeHugger

      Title

      Database clean-up - multiple records for a single person

      Post

      Hi -

       

      I'm attempting to clean-up our database.  We have several individuals with more than one record (i.e., one person, but more than one record).  How can i combine the two into one without loosing any pertinent information?

       

      Thanks.

       

      Tricia

        • 1. Re: Database clean-up - multiple records for a single person
          philmodjunk

          That will depend on the data in the individual records and how you can determine which records all refer to the same person. (Could have two people with same name and even two records for same person but with different name information...)

          And deciding which data is "pertinent" might require human intervention. If a field in record 1 is empty and not in the second record for that person, a script can detect that and copy it. But what if there is different data in this field in both records? How would a script determine which to keep? Is there a date field in the record so that you can select in favor of the most recently edited record? Even then, the "merge" could be incorrect.

          It is possible to perform a find for duplicate values in a specified field and then a sort on the same field will group the records by that identical value, but then you'll have to decide if a script can do the merge from that point or if a human will have to review the groups of records and decide what to keep, what to delete and what to merge.

          • 2. Re: Database clean-up - multiple records for a single person
            davidanders

            Having a Date Created field and a Date Modified field would be most useful.

            It is likely the records with the least empty fields would be the best to keep. But not guaranteed.

            Making a copy of the database would be wise. Call it Dupes, find all duplicate records, delete the rest. You will have a backup.

            • 3. Re: Database clean-up - multiple records for a single person
              TreeHugger

              Thanks for your suggestions Dave.

              Our database does have "date created" and "date modified" fields, though i can see that both records may have info from the same year(s).  Why do you think those particular fields might be important in this case?

              Agreed, the records w/the most info (least empty fields) are likely the best to keep. There's no way to combine 2 records into one?

              We do have a copy of the entire database, so that's not an issue.

              Thanks again for your help.

              Tricia

              • 4. Re: Database clean-up - multiple records for a single person
                TreeHugger

                Thanks for you suggestions PhilModJunk.

                I've determined that the duplicates are acutally the same person. Some of the info in the records is duplicative (e.g., address, phone number), but other information is not (we provide education, so e.g., the training program area of the record does not have duplicative info, i.e., the two records show that the one person attended different trainings).

                How does one merge records?  How can i merge/copy specific information from the record i'll delete into the record I'll keep?  Can i move particular information and not duplicative information?

                Thanks again.

                Tricia

                • 5. Re: Database clean-up - multiple records for a single person
                  philmodjunk

                  There's no way to combine 2 records into one?

                  There are a number of options for combining the records. The issue is how to do it.

                  You could keep the record with  the most recent modification date copying data from any fields in other copies of the record into any fields that are empty in the most recent record--but this may restore data that was intended to be deleted.

                  You could just keep the newest copy of the record and delete the older copies, but this may delete needed data never entered into this copy but present in the older copy(s).

                  You could merge all data from all fields. If the fields contain identical data, don't modify the data in the field, if the data is different append the data from one record to the data in the other. If a field in one copy of the record has "apple" and the same field in the other record has "orange", you end up with "apple orange". but this may not produce desirable results either as maybe the data in one record is "aple" and is "apple" in the other. In which case "aple apple" is unlikely to be  desirable result.

                  You'll need to spot check some cases where you have duplicated records and decide what type of rules, if any, can be built into a script that merges the data for you. As I have previously suggested, you may not be able to get a scripted merge to work without human intervention to decide what data to keep, what to append and what to discard on a record by record basis.