1 2 Previous Next 23 Replies Latest reply on Nov 21, 2014 2:22 PM by Mike_Mitchell

    Which approach to having multi-year data?

    njem

      Which approach to having multi-year data?

       

       

      I have what has grown to be a rather complex db. It is for a non-profit that tracks fundraising events and donors and participants. Some events get complex with all the details to track. The structure I inherited just has current year data available. Previous years' data for a participant, or donor, or a company that's helping, is moved to separate fields and tables at the end of the year and is somewhat orphaned there.

       

       

      For the most part workers need to see current year data but sometimes they need to refer to previous years' data. Participants, too, see their account by web and need to see their previous data.

       

       

      I could mix all the data together with year fields to separate them. So when we just want to see this (coming) year's data we filter to records marked 2015. That would require going through an enormous array of relationships, many that are many levels deep, and finding everyplace where the relationship would now need to be filtered. (I.e. a relationship that used to be a simple "donor ID = donation ID" would now have to have AND "yr = 2015") I suppose the filter would be based on some global so it could filter for "2015" or "2014" or ">2000<2016". I would worry that so much filtering of so many relationships would slow things down or create other complications.

       

       

      Another option is to leave previous years' data in sort of a separate world, separate tables and/or separate fields. The data would be a little less integrated. To view previous years would require a different layout or tab. Summary fields would be a little more complicated (how many donations has a donor given across years) but that could be worked out.

       

       

      There may be other methods I'm not thinking of. There must be good methods of doing this in really huge DBs beyond what I'm used to working on.

       

       

      Any thoughts on the pluses and minuses of different methods, or other methods? Or white papers I haven't found or such?

        • 1. Re: Which approach to having multi-year data?
          Mike_Mitchell

          Your question is actually two questions:

           

          1) What is the correct data model for multi-year data?

           

          2) What is the best way to deal with my current system and the way it's currently modeled?

           

          Those are not the same question.

           

          Reading between the lines, you already know the answer to (1): Put everything in a single table with corresponding date fields for the donations. There's nothing special, from a data modeling standpoint, about the date of a particular donation. But you're wincing at the pain (or potential pain) of the damage it (might) do to the way the system is currently configured.

           

          I deal with this frequently. I inherit databases from people who think in fiscal year blocks and duplicate their databases once a year to start a new year. When I suggest that all their data should be kept in a single file, rather than starting a new database every year, the first question is almost inevitably, "But how will I tell this year's data from last year's?" Obviously, filtering / finding / sorting.

           

          From the information you've provided, it appears that the system you're dealing with is using relationships and portals to display the data. This is really not as much of an issue as you might think. Filtered portals can service here without even impacting the Graph. Failing that, you can script List View secondary windows or popovers (depending on your needs and the particulars) that won't dramatically impact performance. Even a Virtual List can be very useful in these situations, as you can fetch the necessary records using an ExecuteSQL ( ) call and then display them.

           

          I don't know enough about your particular database to say definitively that you should combine all your data irrespective of year. However, I can tell you that, in virtually every case I've ever seen, if access to previous years' data is required, the headaches associated with splitting it out will far exceed the up-front issues of configuring it for overall access within just a few years. The only time I'd say otherwise would be if you're archiving off the data and would not need to see it except perhaps with developer intervention.

           

          HTH

           

          Mike

          • 2. Re: Which approach to having multi-year data?
            keywords

            Just an opening thought here:  what about creating a utility table with just a Year field (or global utility field even), then using that to filter the view to whatever year you want to deal with. That should not require tinkering with all your existing relationships.  I haven't thought it through much further, but that my send you down a thought line.

            • 3. Re: Which approach to having multi-year data?
              njem

              So you think all that extra filtering won't slow things down? I hope you're right. I'm inclinde to go that way but it will be a huge project straightening this out and then if it's unbearably slow it's going to have been a huge waste of time. But I guess I won't know till I try.

              • 4. Re: Which approach to having multi-year data?
                Mike_Mitchell

                I have no way of knowing how much it will slow things down, since I haven't seen the existing database. But I will tell you that, provided all the fields are properly indexed and the structure of the database is sound, performance on indexed Find operations is typically very fast.

                • 5. Re: Which approach to having multi-year data?
                  Mike_Mitchell

                  Just a few thoughts to follow up:

                   

                  1) I have a solution that has a similar setup. It has a Person table with Events. Each Event record has a Category and a Date. There's a portal on each Person record that displays that person's events. The relationship joins personID = personID, a global categoryList = categoryID, and a dateRange = eventDate. Users can filter by category or select all categories (which creates a multikey of all possible categoryID values in categoryList), and they can filter by date (either all dates, which is a date range from 01/01/0001 - 12/31/2999). Performance is fine, even over the WAN.

                   

                  2) If you find the performance to be problematic, or if you're just concerned about it, you can follow keywords' suggestion and create a Year table. This can let you create a multikey of all year values the user has selected, joining to a Year field in the child table, which will be considerably faster than using a <> range join.

                   

                  HTH


                  Mike

                  • 6. Re: Which approach to having multi-year data?
                    taylorsharpe

                    FileMaker speed is almost never limited by number of records, it is usually limited by unstored calculations and calculations that work through related tables.  The type of data most non-profits keep about donors is typically stored and indexed (e.g., gift date, who its from, amount, budget code, description, etc.).  I had a FileMaker table with over 83 million records of stored and indexed fields and it rarely would take more than a 2-3 of seconds over the LAN to find anything in it.  But I can also show you a table with 1000 records and some really weird relationships and unstored calcluations that can bring it to its knees. 

                     

                    The trouble is that people often think the number of records is what determines performance in FileMaker.  It is a factor, but a pretty diminimus one.  Your schema, indexing, portal filtering/sorting, and unstored calcluations will have a lot more effect on performance. 

                     

                    The trouble with separating out data into various files or tables is that you cannot easily make a report that pulls data from the various years.  That is why a good model keeps everything together with its type of information and does not break things out by criteria such as date ranges or coding, etc.  The power of Find/Search will meet your reporting needs and providing more reporting capability if it is all in one table. 

                    • 8. Re: Which approach to having multi-year data?
                      beverly

                      Well, TS. I guess that might depend on the separation and the type of report and from which perspective.  Muliple year reports might easily come FROM the donations table with dates & amounts and pull other related data.

                       

                      Finding in a portal, while possible may or may not be as "expected". So we come up with alternatives there, too.

                       

                      Beverly

                      • 9. Re: Which approach to having multi-year data?
                        DavidJondreau

                        Don't forget network speed and latency having an effect on performance. A local file can do lots of non-optimized stuff that a hosted file on a poorly configured server and network woul be impossible.

                         

                        Also, portal filtering is on the table here so number of records do matter.

                         

                        If there are a dozen donations a year, and data going back 5 years, that's 60 records being accessed, rather than 12. That's probably the limit before you start noticing differences (depending on other factors of course). If there are a 100 donations a year, and data going back 10 years, then that filter has to work through a 1,000 records, and that will suck.

                        • 10. Re: Which approach to having multi-year data?
                          Mike_Mitchell

                          The system I referenced has over 30,000 line items in it. No appreciable degradation in performance due to filtering.

                          • 11. Re: Which approach to having multi-year data?
                            Mike_Mitchell

                            With the qualifier that "filtering", in this case, does not mean a "portal filter", but rather a purpose-built multi-predicate relationship.

                            • 12. Re: Which approach to having multi-year data?
                              DavidJondreau

                              Right. A relationship uses the index. That's pretty quick.

                               

                              A portal filter loads each related record and evaluates the calculation for each. For larger record sets, that can be pretty slow.

                              • 13. Re: Which approach to having multi-year data?
                                Mike_Mitchell

                                Agreed. I should probably use the word "filter" with more explanation.  

                                • 14. Re: Which approach to having multi-year data?
                                  steve_ssh

                                  Hi Mike,

                                   

                                  I have a question about your solution implementation.

                                   

                                  Given that you mentioned that the user can possibly filter on a date range:

                                   

                                     Are you doing something so that it is not necessary to use an inequality comparison in the date predicate?

                                   

                                   

                                  My experience with using an inequality/date constraint is that it performs poorly, and I seem to recall others saying the same.

                                   

                                  I'm guessing that perhaps you are using a multi-key set up on one side of the constraint, but I figured I'd ask in order to educate myself, and also perhaps tease out some more information which could be useful w.r.t. the OP.

                                   

                                  If you are successfully using a date inequality in your relationship, then I need to go back and reconsider what I thought I knew...

                                   

                                  Very best regards & thank you,

                                   

                                  -steve

                                  1 2 Previous Next