2 Replies Latest reply on Jul 31, 2013 11:58 PM by AaronTokunaga

    Summarizing and Merging Data based on Partial matching on a field?

    AaronTokunaga

      Title

      Summarizing and Merging Data based on Partial matching on a field?

      Post

           I apologize in advance if this was asked already but I searched and could not find a similar situation.

           Currently, the management is requesting a remake of the now ancient financial system originally written using a dead piece of software "The Card 8.0" by iFour (Japanese company).

           The system has been modernized and most features are flawlessly working and only required the office staff stop trying to make special cases for everything.

           However, we have one feature that must be implemented to follow company policy/procedure and I'll try to describe it as simply as I can:

            

           * Each customer has an account for various classes/courses (This is a music, rhythmic and language school) and they have their own Account ID

           * Customers may have multiple accounts in cases such as they have mutiple children enrolled in different courses or they are a teacher in a separate school enrolling a class, etc.

           * Customer IDs are multiples of 10 in general (e.g. 1010, 1020, etc.) but in the event of multiple accounts per family/group the IDs increment by 1 starting with a 1 in the last digit.  (e.g. 1011, 1012, 1013)

           * In the montly data we output in CSV format to give to the bank for their proprietary system, we do not wish to charge the same customer twice, so using the previous example, all the other records output for accounts 1011,1012,1013 would just use the data from 1011, and the total monetary amount in the last field would be summed over the three records and output at the end. The staff has had bad habits in the past though and the name, address and bank account information fields in the case of multiple accounts is not always present beyond the first record (which doesn't matter since it should only use the data in the first record)

            

           So:

           ID         (Various name, address and bank account fields)      Total Amount

           1011             (stuff)                                                                              1000

           1012             (stuff)                                                                              1500

                1013             (stuff)                                                                              2000

                 

           Should be output as

            

           ID         (Various name, address and bank account fields)      Total Amount

           1011             (stuff)                                                                              4500

            
           I currently have a silly python script that runs on the data to do just this but it is kind of a kludgy hack and I would like a more elegant solution.

        • 1. Re: Summarizing and Merging Data based on Partial matching on a field?
          philmodjunk

               Ouch! What happens if your customer has 11 accounts? Would they be numbered 1010...1020?

               Assuming that they never have more than 10 accounts (a very dangerous assumption), a calcualtion field can compute a common value for all account numbers from 1010 through 1019:

               Div ( AccountID ; 10 )

               in our example will produce 101 for values 1010...1019. and 1020 will produce 102.

               You can then define a summary field that computes the total of your Total Amount field and then you can export your data after sorting the records by this new calculation field (and then by the account ID field) and use the Group by option to get one row for each group of account IDs.

               I also strongly recommend that you move towards a different system where each customer gets a separate unique ID (not based on this account ID) in a different table of one record for each customer that you then link to these individual records.

          • 2. Re: Summarizing and Merging Data based on Partial matching on a field?
            AaronTokunaga

                 Thanks for the reply!

                 Yes, the underlying design needs a major overhaul but since all of the staff are stretched to their breaking point, we probably won't have time to sit down and rethink the overall structure for a few months at least at this rate. I would think a single customer table and then a related order/course/programme table linked by customer ID would work much better. There are a good number of combinations for every course type/day/time/instructor but it isn't out of the realm of possibility to list them all up and just have the user select one or multiple from a big list. A separate layout could be used to edit the course information.

                 Back to the present system...

                 So far, the largest number of accounts per customer is 4, but a worrying fact is the account numbers go as high as 6200 presently. The system was started a decade ago and based on the growth rate and historical stats we should be all right for a while longer until we can redesign everything but this is just one of the many horrors of the system designed many years ago by staff now long gone. Until now, every field was a text field and things like date fields were broken up into separate Year name (Japanese : Heisei, Showa, etc.) Year, Month, Day fields, though it is a single date field now and even then there were combinations of single and double byte characters and formats were really loose with too many special cases.

                 Okay, I'm going to make a calculation field like you say and see if I can get things working this way. The python script I have does sort of what you suggested. It takes the first three digits and groups records like that, summing the last field output at the end of each "group" (It works better than it sounds) but I would prefer not to do this since the users of the system aren't ones I would trust mucking with exported data and feeding it to a python script.

                  

                 EDIT> Okay, I gave it a try and I was able to achieve the results I was looking for and was able to do away with the secondary python script. Until the system can be redesigned this seems to work very nicely. Thanks a lot.