2 Replies Latest reply on Oct 13, 2010 11:02 AM by davidanders

    Long-term strategy for distributing data across multiple files



      Long-term strategy for distributing data across multiple files


      Hi everyone,

      I'm building a solution that has been centered around a single project per file (FM11+11 Server Advanced). Now the company would like to use data from all projects for business intelligence purposes. I am reluctant to integrate all data into a single file, for several reasons:

      1. The file will get huge. Projects contain about 50-200 mb of data (includes thumbnail pictures; linking is not really an option). Even at 10 projects a year, this means several gigs of data not too far in the future.

      2. Back-ups; consistency: Not only it is impractical to create backups for EVERYTHING once every hour, even if only one project is active at a time; the risk is high because if something breaks, EVERYTHING will be messed up.

      3. Speed. I'm afraid that data lookup will become increasingly sluggish as record counts go up.

      Therefore it seems practical to me to use a single file for every project. Is it possible to create a business intelligence file that will accumulate horizontal data from all project files? I'm afraid the answer is no, but I thought I'd ask. Thanks!


        • 1. Re: Long-term strategy for distributing data across multiple files

          There are several factors to consider. The fact that files are large does not mean they are necessarily much slower, if much of the size is because of container data and large text fields. But yes, they will be somewhat slower. And yes, a file size of many GB, with all the data in one file is a little worrisome.

          Since you say that the main use of the congregated data would be "business intelligence", then I wonder if you could consider this as "archive" data, which is not expected to change. If so then there is no real problem (though a fair amount of work).

          Then all your separate Projects file could "push" data to a central file, when a project is finished (forever). So it would not be "up to date," but might be enough for what they want. The central file would have much the same structure as the other files, which makes it fairly easy to create from a clone, but only necessary fields (both now and future). It could also lose all those many structures needed for data entry and processing (including tables, fields and scripts not needed). 

          [ I would do this before even attempting to set up any Import scripts.]

          Most calculations, especially complex relational ones, could be flattened, with the results into plain fields (mostly number). This can make using the data up to hundreds of times faster, making it much better suited to mass analysis.

          Each Project (and other relevant files) would have the central file as an External Data Source, the relevant tables on the Relationship Graph, layouts needed, and all the necessary procedures to push the "fixed" data, usually Imports. Once you have this, that Project file becomes the template for all new ones.

          If this structure and procedures are extensive (likely), then my approach would be to build a solid "import a project file into a clone", to bring the existing Project files into this new structure. It MUST reset the "next serial ID" of every relevant table. Yes, a bit of a PITA. But:

          1. You should have this anyway. It is the best way to restore crashed or damaged files.
          2. Once built, it takes a while to run for each, but is completely automated.

          • 2. Re: Long-term strategy for distributing data across multiple files

            This is how I handled a Quoted Price vs Actual Cost for an advertising agency.

            It was to allow the principles gauge the salesmen Price Quotes history.

            David Anders
            The Computer Guy, Seattle
            An animation I made