8 Replies Latest reply on Sep 10, 2016 9:57 AM by philmodjunk

    Data management: To store data or derive it? ... that is the question

    pasleeth

      For the theory wizards among you, How do you decide when to keep numbers solely within the confines of a calculation field as "derived" information, and when it's preferrable to give that data a permanent home within a database?  Here's the real-world example I'm wrestling with, which occurs within the context of student records.

       

      A key statistic of interest in higher ed is the time it takes for students to complete their degrees, something typically reported as a decimal number in units of years.  For the purpose of my calculation, a student's starting term and graduation term are assigned standardized month/day dates (year is variable, of course) that approximate when semesters -- fall, spring, and summer -- usually begin and end.  (Scripting actual semester beginning and end dates is out of the question.)  Computing completion time is simple math ... but here's the storage rub.

       

      1)  If term/semester values are already being stored in lookup tables as four-digit integer codes (CYYM), does it make more sense to store the nominal start and end dates as part of the lookup table rather than calculating them individually on auto-entry for each student record?

       

      2)  For the purpose of computing elapsed days, does it make more sense to create two fields, one for start date and one for end date, in which to store those values for a student, in addition to having the calculation field for completion time that uses those two dates?  Or instead, is there a way to write the calculation so that the terms already being stored in the student record as 4-digit codes (CYYM) are converted into dates/numbers using the lookup tables, fed into the math, and then storing only the resulting elapsed years and nothing else?

       

      3)  And lastly, wherever start/end dates are stored -- if they are stored anywhere -- is it more efficient to have FileMaker store them as numbers using the GetAsNumber function rather than as calendar dates, considering that the actual dates themselves don't ever have to be displayed/output anywhere, only the term codes, which are already part of the visibly displayed record?

       

      -- Andrew

        • 1. Re: Data management: To store data or derive it? ... that is the question
          philmodjunk

          Please explain the meaning of "CYYM". Don't follow what that means. I thought that you meant 0916 meant September, 2016, but can't be sure since CYYM doesn't mean anything to me.

           

          I can answer this:

          is it more efficient to have FileMaker store them as numbers using the GetAsNumber function rather than as calendar dates,

          FileMaker stores dates as integers--the same value returned by getasnumber, so storing them either way makes no real difference when it comes to using the values in calculations.

           

          In general, what you want to consider carefully is whether your calculation can be stored or must be unstored. As long as it can be defined as a stored calculation or as a data field with an auto-enter calculation, the calculation only re-evaluates when a referenced value changes (not not even then for auto-entered calculations if the value is in a global field or related record). So there will be no major efficiency differences as long as that is the case for either option. But if one option requires an unstored calculation and another does not, then you have an issue to evaluate carefully because unstored calculations can produce performance lags in some circumstances (but they also can reliably stay up to date when other methods require a lot of careful handling so this is NOT advice to not use them.)

          1 of 1 people found this helpful
          • 2. Re: Data management: To store data or derive it? ... that is the question
            pasleeth

            "C" represents the date's millennium, "YY" the year's last two digits, and "M" the semester's month designation, which will only be 1, 6, 7, or 8.  So a student who started in 2013 Fall Term would get 2138 for a start term and 8/15/2013 for start date.  And someone who graduated this year in the 2016 Spring Term would get 2161 for an end term and 5/15/2016 for end date. For my particular student population, there are three possible start terms/dates (mostly just two, though), and three possible end terms/dates.

             

            Some parts of the university's record system rely on text fields for terms ("2016 Fall Term"), some the codes (2168), and some both. Consequently, I've had to design both value types into my FMP databases, using my own lookup table to get values as the data type requires.

             

            All that esoterica aside, you've pretty well supplied me the guidance and concepts I needed to attack this problem -- no surprise really, considering your past record with my questions. So once again I offer my sincere appreciation.

             

            -- Andrew

            • 3. Re: Data management: To store data or derive it? ... that is the question
              pasleeth

              pasleeth wrote:

               

              For my particular student population, there are three possible start terms/dates (mostly just two, though), and three possible end terms/dates.

              I should've said three possible for any given year.

               

              My goof.

               

              -- Andrew

              • 4. Re: Data management: To store data or derive it? ... that is the question
                gdurniak

                I'm not sure the answer marked "Correct" is quite complete

                 

                You are really asking if your data should be "Normalized"

                 

                Then yes,  you ideally should not put redundant term information in each Student Record

                 

                However, FileMaker is notoriously slow when calculating ( e.g. summing ) values that are "un-stored", and even worse if "related " ( from another Table )

                 

                If your dataset is relatively small ( e.g. a few thousand records ), then the performance hit will be minimal

                 

                Try it.  If the Normalized data works, then that is always preferred

                 

                But, if printing a report takes forever,  you may then need to "cheat", and use un-normalized stored data

                 

                greg

                 

                > How do you decide when to keep numbers solely within the confines of a calculation field as "derived" information, and when it's preferrable to give that data a permanent home within a database ?

                2 of 2 people found this helpful
                • 5. Re: Data management: To store data or derive it? ... that is the question
                  beverly

                  CYYM is not a color filter (cyan, yellow, yellow, magenta), or the Cowley, Alberta airport.

                   

                  "C" is century for genealogical data

                   

                  I see that pasleeth (Andrew) explained his usage. In the universities I have as clients we use this in FileMaker:

                       TYYYY (term & full year)

                  Where term is (winter, spring, summer, or fall), as it's what is stored in their Oracle data.

                   

                  And I'll add to the "store or derive" topic with:

                  If it is HISTORICAL in any way, store it (even former summaries and calculations). And not just for universities.

                  beverly

                  • 6. Re: Data management: To store data or derive it? ... that is the question
                    gdurniak

                    To say "store it" is a bit misleading here

                     

                    If you require a paper trail,  store the printed report ( PDF ) in a Container,  not the related data  :-)

                     

                    greg

                     

                    > If it is HISTORICAL in any way, store it (even former summaries and calculations). And not just for universities

                    1 of 1 people found this helpful
                    • 7. Re: Data management: To store data or derive it? ... that is the question
                      beverly

                      yes, store as PDF, export to archive, push to something external as well.

                       

                      but when I need a total (summary) of related data on an invoice that has been sent, I script set it and prevent changes. It's no longer a calculation or summary. it's stored data. It's HISTORICAL.

                      • 8. Re: Data management: To store data or derive it? ... that is the question
                        philmodjunk

                        A PDF may be helpful for stored data, but I wouldn't use it as a substitute for storing the actual historical data in a table somewhere. The data in the table can be sorted,  searched, analyzed via FileMaker and other software. Can't do that with a PDF. But the stored data can, as needed, be used to produce the PDF so I'd go for the data over the PDF.

                         

                        And I agree that my response was not "complete". It was never intended to be, just an answer to the most obvious issue. But to me, it sounded like all of the data was local to the same table with the differences described being just different ways to store/calculate that data from within the same record. If true, then there will be little if any observable difference in performance as all of the calculations will be stored calculations.

                         

                        That said, I did once encounter a case where it would have been preferable to use an auto-entered calculation to store a calculated result instead of a calculation field in historical data.

                         

                        Many versions ago, there was a bug in FileMaker's calculation engine that failed to correctly round certain values. When I converted over an archive table of Purchase Orders, to a new version and file format of FileMaker, the fields re-calculated and since the bug had been corrected, the line item amounts for a small number of archival records became different than the actual values that were used in those transactions. An auto-entered calculation would not have recalculated and yes, a stored PDF of the PO also would have kept the same values--though in this case, that would have meant little as the only way that this data is used is in summary type reports that compare historical data across multiple years (and for that, I used a denormalized summary table to get the results that I need quickly.)

                        2 of 2 people found this helpful