6 Replies Latest reply on Sep 23, 2014 10:48 AM by davidanders

    4 Million Records Optimizations?



      4 Million Records Optimizations?


      I am working with a table that has over 4 million records, and I am seeking any filemaker pro best practices resources on working with a table like this. I am using ExecuteSQL statements to query found sets, and it is very slow. Any suggestions are welcome.

        • 1. Re: 4 Million Records Optimizations?

          I find ExecuteSQL very slow if you are returning more than a few records. (<100)  Therefore I only use it when I expect a few records returned. 

          The native FileMaker find is very fast. Especially if you're searching indexed fields. To return a record set of a few thousand out of 4 million should take only a second, provided you searched on an indexed field. So whatever field you're searching, turn on full indexing for it.

          Never, ever search an unstored calculation in a file this size. I've had FM crash after accidentally searching an unstored calc and clicking Cancel. Of course any calculation that uses a relationship is automatically unstored and should never be searched. In this case it's better to break normalization and have the same field indexed in the current table and use a trigger or other method to keep them synchronized rather than use a Calc with a relationship.

          Relationships are also fast. Use a Global field in a relationship (either to another table or linked back to the same one) type your search criteria into the Global then use "Go To Related Record" to get the found set.


          Hope that helps.


          • 2. Re: 4 Million Records Optimizations?

            ExecuteSQL does not "query found sets". It queries every record in the table and I doubt that you'll get very fast results using them on a table of 4 million records.

            I must politely disagree with Brian's advice to never ever use an unstored calculation. They are not always avoidable and the key issue is to avoid using them in a context of large records sets such as "show all records" or in a relationship that matches to very large numbers of records as this can literally bog everything down--not only for the user who initiates such a scenario--but also for all other clients of the same database.

            Thus, your interface design and using scripts to keep users from accidentally tripping such land mines is vitally important.

            When performing a find, If you need to specify find criteria in both indexed and unindexed fields, you can save huge amounts of time with a two stage find: Find records specifying only criteria in indexed fields, then return to find mode, specify the remaining criteria in that must be specified in unindexed fields and then constrain the found set. Since the second stage is only applied against the found set produced by the first, this is usually much, much faster than trying to do this type of data search in a single find operation.

            I suggest that you also do some web searches on FileMaker optimization to learn even more options for improving performance.

            • 3. Re: 4 Million Records Optimizations?

              I don't know if this is pertinent in your case, but I thought I'd share. Phil's comment about bogging down other clients got me to thinking. So I thought I'd share how my predecessor dealt with keeping slowdowns from affecting other clients.

               We run a lot of complex reports that can take several minutes to run. (Some of which unavoidably need to search unstored calcs, although none of these tables are anywhere near 4 million records) When the reports run it bogs down performance to all the other users. So what we've done is create a duplicate server just for running reports while the main server is for general use. Every night the reports server shuts down FM server and initiates a Unix script to copy a backup set from the main server then starts FM server up again. This has several advantages. 

              1. You can run reports without slowing down other users. (obviously)

              2. You have a quick reference if a user mistakenly changes some data. You can quickly see what it used to be and reenter it with out having to go to back ups. 

              3. It gives the developers a place to experiment with current data. Any changes you make are not permanent as they get wiped out each night. So it becomes a bit of a playground for trying out ideas. If it works you duplicate it on the development server to flesh it out further. If it doesn't work you just leave it and next morning it's gone. No harm no foul. We just have to be careful any experiments don't affect reports.

              • 4. Re: 4 Million Records Optimizations?

                That's not a bad idea at all if you have the resources to make it happen. I often test ideas on a local copy first specifically to ensure that I don't bog the system down with a new feature--is a little similar to what Brian suggests.

                Another tactic is to reduce the number of items that have to be calculated when unstored calculations or summary field values are part of your search and/or reporting process. This doesn't work with all forms of data, but if you can "pre-calculate" values frequently needed in reports, you can greatly speed up results. An example is when you have a typical Invoice to lineitems relationship with an unstored calculation computing the total cost (or value) of the listed line items. You can add a number field to Invoice and when the user clicks a button to "complete" the sales or ordering transaction process, the script stores the current value of the unstored calculation in that number field. Since the details of a completed transaction should never be changed, you can now search or sort on the stored value when working with the total value of your invoices.

                An even more ambitious trick is to then have a "close of business day" scheduled script run through all the day's line item records and create new records in a summary table where you have one new record with today's date created for each product ID found on at least one invoice of the day's invoices. Number fields in this "summary record" then record total quantity, total cost, average cost, etc. This one entry can combine line item data from many different invoices and thus a summary report of the day's, month"s, quarter's, year's, ...  sales, computes totals from a much smaller set of records than if you used the original line items records.

                • 5. Re: 4 Million Records Optimizations?

                  We have something similar as well. We have a large set of scripts that run at night. They copy unstored calculation fields to regular indexed fields. They also send out various emails to clients, reminders to staff etc. All in all they take a couple hours to run.

                  I like your idea of pre creating parts of reports at night. I'll have to try that.