11 Replies Latest reply on Oct 30, 2014 7:50 PM by jormond

    250,000 + records

    jurgmay

      Hello,

       

      I have a database with over 250,000 records which contain information relating to people who have taken out magazine subscriptions spanning decades. These are currently all in one table but there only around 5,500 active records - the others contain people who are no longer alive, have ceased their subscriptions, are no longer contactable or are otherwise 'inactive'.

       

      My question: is there a best practice for handling this scenario? I'm wondering about moving the deliquent records to a separate table which is identical to the source table but maybe there's a better way?!

       

      Many thanks,

       

      Juerg

        • 1. Re: 250,000 + records
          KylePutzier

          Just leave them in there. They're not hurting anything. If you move the dead records into another table, then all you have are two tables where one has a bunch of dead records and the other having some live records AND, eventually, some dead records.

          Create a status field and tag them for what they are.

           

          Kyle

          • 2. Re: 250,000 + records
            Malcolm

            How is the performance? If only 3% of your data set are active you might

            consider setting up a very narrow table which carries the IDs of the

            active records. You can create a one to one relationship between this

            new table and the original. That will allow you to handle the small,

            active, data set without having to modify the existing system very much.

             

            Malcolm

            • 3. Re: 250,000 + records
              jurgmay

              Hi Kyle,

               

              Whilst they're not hurting anything it's seems obvious, and I should have stated this, that there is going to be a hit on performance when running any kind on analysis on the 5,500 records as the entire data set must be processed. I figure that it just makes sense to have those records somehow separate to the 250,000 which are of no real value.

               

              Juerg

              • 4. Re: 250,000 + records
                jurgmay

                Hi Malcolm,

                 

                Performance in general use is fine. I do have an issue with some dashboard widgets but I'm fixing those as I was referencing unstored calcs across the relationship(!) I'm seeing an improvement as I set about fixing that issue. However, I like the sound of your suggestion and will give that a go. Thanks.

                 

                Juerg

                • 5. Re: 250,000 + records
                  gdurniak

                  250K records is very small,  so your "analysis" may not be efficient

                   

                  Normally,  a script would "find" the active records first

                   

                  but if an active record must be compared to all records,  it will slow down,  as the file grows

                   

                  e.g.  if all records are self related,  add an "active" key to the relationship,  so non active are excluded

                   

                  greg

                   

                   

                  > that there is going to be a hit on performance when running any kind on analysis on the 5,500 records as the entire data set must be processed

                  • 6. Re: 250,000 + records
                    jurgmay

                    Thanks Greg.

                     

                    Even uncomplicated statements like 'Count ( SubsCancelledThisMonth::id )' take 20+ seconds to complete and I have a number of simliar calculations on the dashboard screen. Collectively they take nearly two minutes to complete. Is it better to create a layout with just the fields on that I need to look at and script it? I'd assumed a simple calculation would be more efficient as a script would effectively have many more lines of code.

                     

                    Thanks for your input.

                     

                    Juerg

                    • 7. Re: 250,000 + records
                      gdurniak

                      Yes, unstored calcs across a relationship are very slow

                       

                      and if any Relation uses < or >,  it's even worse

                       

                      Dashboards are tricky in general,  since FileMaker is slow at math

                       

                      Perhaps run a script each night,  to store the values you need

                       

                      or,  if perhaps  CancelledThisMonth = 1,  then "find" the active records,  and show a plain summary,  rather than an Aggregate Count

                       

                      greg


                       

                      > I was referencing unstored calcs across the relationship(

                      • 8. Re: 250,000 + records
                        jurgmay

                        Thanks Greg. I didn't know about the < or > being an issue.

                         

                        I like the idea of an 'end of day' routine to fix the values. I'm not sure they require real-time values so that should work quite well. I can provide a 'refresh' mechanism if they need it.

                         

                        For some of the values a 'find' will work just fine as often I just need the 'Found Count' value - no real math involved - so that's also a good option.

                         

                        Thanks for your help. Much appreciated.

                         

                        Juerg

                        • 9. Re: 250,000 + records
                          ninja

                          +1 to Malcolms suggestion.

                           

                          When I built an inventory solution where 99% of the stock was active (but knowing things would be archived sooner or later) I built a "thin" Active table and an "ALL" table.

                          All of the data is on the "ALL" table and the thin table just has UniqueID and FoundCount calcs.

                           

                          Now (5 years later), ~ 3% of the records are active, all the rest are archived...and it runs just as fast as when new.

                          The record count in the "ALL" table is a few hundred thousand...The record count in the "Active" table is a few hundred.  Speed is not compromised on day to day use.

                           

                          Good luck!

                          • 10. Re: 250,000 + records
                            brsamuel

                            The Count () function will need to evaluate every recrd in the found set.  sloooow.  Instead, add an unstored calc to the table.  The calculation should be "Get ( FoundCount)"  When you perform a find of the active accounts, this new field will provide you with the same results, but do so instantaneously.  This field is so useful, I make it a standard field in virtually every table I create - just in case I need it.

                             

                             

                            A most excellent blog article by Daniel Wood can be found here:

                            http://www.teamdf.com/weetbicks/17/a-lightning-fast-alternative-to-the-count-function

                             

                            hth

                            • 11. Re: 250,000 + records
                              jormond

                              You could also avoid the field itself if you don't need it always refresh on it's own. Just set a variable that displays the found set. Depending on what you need.