8 Replies Latest reply on Feb 12, 2015 5:38 PM by Jason_Farnsworth

    Guarding against growth (size) in large Databases (Best Practices)

    Jason_Farnsworth

      As requested, I have created a new thread with this topic,

       

      If ~100 years of data exists, and thus needs to be manage but only ~1 year (current projects) of information is accessed on a consistent basis that represents only 1% of the data.

       

      Why not create two databases a day to day operating one, and then an archive of closed out info? Closed out jobs might get accessed on occasion, very rarely, or if ever at all. If fact why not compile the data that goes into the archive as a large text document (ie inspector notes) and still save off the large pdfs. This would dramatically reduce the size of the database and complexity of the archive data and keep its format very simple. That said the offset will be the writing and keeping up with the compiler as the database morphs. I would think backups would be much simpler with the core of the archive data never changing.

       

      Now with all the archive data stripped from the operating database and it only containing ~1 year of active data it would seem that no extreme measures should be mandated to guard constant growth.

       

      What are your thoughts,

       

      Jason Farnsworth

        • 1. Re: Guarding against growth (size) in large Databases (Best Practices)
          mikebeargie

          First , I wholeheartedly agree, and have setup a few solutions to archive 10+ years of data with general success.

           

          I would recommend to setup your database using the data separation model, will save you a lot of time later during re-archive operations. Also, if you change the data schema, you will need to change it in two locations and update any "send to archive" scripts. But really you will never need the UI in the archive data, so data separation allows you to drop that part.

           

          Secondly, as long at the client agrees that _X_ date is good as a cutoff point, then the only thing you need to worry about is any reporting that references the archived data.

           

          I tackle this one of two ways:

          1) Run all of the reports as of _X_ date and store them for ease of reference. Kludgy, but actually meets the needs of 95% of ancient data files.

          2) Seed a record with sum totals of all the data before _X_. That way your current database has one record that represents the sum totals of all prior data for reporting so it can be included for future reports. It's important to hardcode a warning that states that if that record is included, it represents a report from the beginning of history.

           

          In more complex solutions, you can actually use something like PSoS to retrieve a specified set of records from the archive. Say for instance if you needed a complete job history. You could parse that into a temp table for the report, so the data never lives in the "live" system.

          • 2. Re: Guarding against growth (size) in large Databases (Best Practices)
            mbraendle

            Just wondering: Do you already have 100 years of data or plan to have 100 years of data (the latter would then finally be handled by FileMaker 81 ...)?

             

            On a retrospective basis, are you aware that OCR quality of old documents is worse because current OCR software is not trained for the fonts that were used at that time?

            On prospective sight, is there a preservation plan (e.g. with an analysis of the file formats, procedures to convert into archive-save formats such as PDF-A, proper metadata schema, fallback positions if FileMaker ceases to exist)? Would FileMaker be the right tool for long-term archiving? Do any recommendations exist?

             

            Just my librarian point of view.

            • 3. Re: Guarding against growth (size) in large Databases (Best Practices)
              mikebeargie

              I haven’t worked with container data, mostly because the 25 year old data I was archiving at the time came into existence before PDF was a thing. OCR has never played into any of my solutions.

               

              Considering you can still open adobe PDF v1 in the modern reader, I’m not sure that type of thing is ever really going to be an issue.

               

              Your thoughts seemed more geared towards the how, and not the what of storage.

              • 4. Re: Guarding against growth (size) in large Databases (Best Practices)
                Jason_Farnsworth

                Mike,

                 

                Thank you for the reply, and thoughtful time you took to make it.

                 

                Do you have a crude example of the data separation model? I would like to see the mechanics of it, and see how it responds.

                 

                I think _X_ date cut-off point may not be the trigger of archiving, but rather a completion of the project as each will take different amounts of time to compete all completing within a relevant range of dates.

                 

                I had planned (or considered) compiling the data (once is it complete) including all related data into (1) text file that is a basic narrative for future reference and then scrub it from the operating database to keep performance high.

                 

                I understand your point of changes and the need to change things down the road will create a constant revisit to the compiling scripts. Perhaps this data separation model streamlines that process and creates an ease of future adjustments.

                 

                Having one narrative of archived data will present an ease of search I would think. Also not to forget the pdf of the files final amendment of the projects plans will be resident in the archive.

                 

                Jason Farnsworth

                • 5. Re: Guarding against growth (size) in large Databases (Best Practices)
                  Jason_Farnsworth

                  I have 100 years of data at present to contend with at present.

                   

                  I hate to say the but a good range of the years are all on microfiche and will need to be converted to PDF without consideration of picking of the text for searches.

                   

                  Having said that I do not feel a text search on a set of prints will be what is looked for on archive search but rather information associated with the prints. (set of building prints)

                   

                  I feel that keeping the archive simple is the way to guard against filmmaker fading out.

                   

                  Jason Farnsworth

                  • 6. Re: Guarding against growth (size) in large Databases (Best Practices)
                    mikebeargie

                    I don’t have a sample handy, but see these two resources:

                    http://www.seedcode.com/pmwiki/pmwiki.php?n=SeedCodeComplete2.SeparationModel

                     

                    https://www.youtube.com/watch?v=h2IgbhpPjn4

                     

                    It’s relatively easy to implement, basically you duplicate your file, change the UI file so the tables point to the tables in the data file, then delete the tables from your UI file. There’s more steps with security and such, but that’s the basic of it.

                    • 7. Re: Guarding against growth (size) in large Databases (Best Practices)
                      mikebeargie

                      You might want to look at your OCR/Scan to PDF solution outside of FileMaker if it comes to it. Most of the FileMaker solutions available are heavily reliant on plugins (EG TWAIN) that can perform poorly.

                      • 8. Re: Guarding against growth (size) in large Databases (Best Practices)
                        Jason_Farnsworth

                        Scanning outside of Filemaker is a given in my mind.