1 2 Previous Next 17 Replies Latest reply on Mar 20, 2017 8:55 AM by peterdurant

    When is a file too big and what do you do about it?

    peterdurant

      I am creating a voter database with one layout and about 25 fields:

       

      Voter ID

      Election 1 Name (i.e. 2016 Presidential Election)

      Election 1 Date  (i.e. 11/08/2016)

      Election 1 Abr  (i.e. PRES-16)

      Election 2 Name...

       

      There are about 6 million unique voters with anywhere from maybe 300,000 to 3.5 million people voting in each election (depending on the election).  I have to import each election separately (if the voter ID shows up in the import file then they have voted). I Update Existing Records based on Voter ID and add remaining records as new.

       

      I can only get about 2 elections imported before the import gets VERY slow (can take up to 6 hours whereas the first one took 5 minutes).  by the 3rd election the whole thing crashes FM and I can get no further.

       

      I have tried importing from Excel (as long as there were less than 1.4 million voters), I have tried making individual FM databases for each election and then importing importing into a master FM database and I have tried directly from CSV file.  Right now I have one database for each election but dealing with 10 years of election history and multiple elections per year is getting a little unwieldy.

       

      So my question is: Is this just too big a solution for FM to handle and if so (or not) how would you handle this? 

       

      Thanks,
      Peter

        • 1. Re: When is a file too big and what do you do about it?
          wimdecorte

          Are you using FM15?   We're having an issue with 10 million record data imports too that we are working through right now.  Chunking the data up into smaller imports seems to work but is cumbersome.

          • 2. Re: When is a file too big and what do you do about it?
            jmedema

            One of the reasons that the imports are slow is likely to do with FM trying to update the indexes for each indexed field. I'm not sure if my description is accurate so anyone is welcome to jump in to clarify and correct, but I believe that for each individual record you import FM is checking across all 25 indexed fields (or however many are indexed)  to be sure its indexes are up-to-date. When you are importing (or deleting) large data sets, FM's attempts to keep a perfectly accurate index as each - individual - record - is - added - or - deleted slows it down.

             

            This can explain why your initial imports were so fast but your subsequent imports have slowed to a crawl. At the beginning, you had smaller data sets and fewer indexed fields.

             

            It's the impact of large-scale changes to data sets all at the same time, which is why chunking it up helps so FM doesn't have to swallow the whole thing all with one gulp.

             

            Back in the day when I was working with Soliant there was a client that did massive data imports and exports (LOTS of records with LOTS of fields) and the attempts to update the indexes was the primary culprit. Their solution could not easily be replaced without lots of money so they ended up turning off the indexing in Field Options, doing their imports and deletes, then turning them back on. It sped up the process of the imports and deletes dramatically and, when finished, they'd turn on the indexes they needed.

             

            So, go into Manage Database and check to see which fields have indexes. Decide which fields need to be indexed (primarily beneficial in Finds and Sorts), and turn OFF the indexes for the fields that don't need to be indexed.

             

            Again, I'm not sure if my understanding is accurate, but during imports and deletes of large record sets, I am guessing that FM is trying to manage a lot of this in cache. With that as a thought, I'd check also check to see how much RAM has been dedicated to FM - if it's low, FM could be trying to compensate by caching to the hard drive. Check your RAM allocation in the FileMaker Pro menu > Preferences > Memory.

             

            All the best,

             

            Jim

            • 3. Re: When is a file too big and what do you do about it?
              peterdurant

              Yes, we're using FM15, I can try breaking it down into smaller files but yes, that would be extremely time consuming.  Ugh!

               

              I also tried turning off indexing for all of the fields but that didn't seem to help either.  I'll try upping the memory and seeing if that might help.

              • 4. Re: When is a file too big and what do you do about it?
                JonThatcher

                First, if it's crashing FileMaker, you should report the issue to FileMaker Support so they can reproduce it and get those pesky engineers to fix it. If some of the data you are using is public and can be provided to Support, that would be ideal.

                 

                Second, it might be good to take a step back and see if there is a different way of organizing the data that works for the solution you need. The single table with name, date and abbreviation per election *per voter* seems like it is going to cause a huge amount of duplicated data. I.e., if there are 300K voters in election PRES-16, you have 300K copies of the same election name, date, and abbrev. That seems pretty inefficient, and would explain the file getting huge and slow very quickly.

                 

                If your solution allows it, it seems like it would be better to have two tables:

                1. Elections

                electionID - Number - Unique - Auto-enter Serial number

                name

                date

                abbrev

                 

                2. Votes

                voterID - Number - Unique

                electionID - Number

                 

                So assuming there is a single import file per election, for each import you would:

                1. manually create ONE new record in Elections with the name/date/abbrev, and recording the new electionID

                2. the import would go into the Votes table, always adding new records, and

                2b. with the electionID field being auto-entered from a global field containing the just created Elections::electionID

                 

                This means that instead of having one record per voter showing all elections they voted in, there are N records in the Votes table, one for each of the N elections that voterID voted in. Once the import is done and the VoterID field is indexed, it should be extremely fast to find all the records per voter, and display them as needed (even in one row, if you wanted). It should also be fast to find all the voterIDs that voted in a particular electionID, assuming that Votes::electionID is also indexed.

                 

                The one downside of using this two table approach is if someone tries to view all elections for all voters at once. That would require FileMaker to basically recreate all the data duplication that the single-table method would have included, which would obviously be a huge amount of work and therefore slow. But hopefully that is not part of what your solution needs to support, and you can design the solution to prevent the user from purposefully (or accidentally) trying to work on too many records at once.

                 

                Hope that helps,

                Jon

                1 of 1 people found this helpful
                • 5. Re: When is a file too big and what do you do about it?
                  user27087

                  As a reference point, we have had no problems with the following running on a beefy Mac Pro:

                   

                  We import about 20M records every month into a table that has anywhere from 200M to 300M rows. Our largest single import is about 4M rows, and takes about 3 hours to import.

                   

                  4 indexed text fields, 45 number fields (6 of which are indexed) and 4 indexed calculations. About 1,000 import rows per second when the file was below 100M, but now imports about 500 per second. Never had a crash on import or deletion. File size is about 80GB at the moment, as high as 100GB.

                   

                  One possible advantage we have is that the 4 text fields (all indexed) have a subset of possible values. The 6 number fields are decimals so they are all over the map.

                   

                  Edit: We do all of this on FM Server 15, not FMP 15.

                  • 6. Re: When is a file too big and what do you do about it?
                    peterdurant

                    Jon,

                     

                    That may work.  The only problem that might come up is that I then use that database as an external data source for my main voter database and populate voting fields so that you can visually see what elections a person has voted in.

                     

                    But I think you may be on to something...

                    • 7. Re: When is a file too big and what do you do about it?
                      gdurniak

                      My guess is that "update records" is the Bottleneck

                       

                      Finding the matching Record, during the import, for up to 3 Million Voters,  is a lot to ask

                       

                      Is there any way to just bring in the records , raw, then find another way to "roll up" the results, by voter ID ( perhaps a looping script, to parse the import, using a temporary table )

                       

                      greg

                       

                       

                      > I Update Existing Records based on Voter ID and add remaining records as new.

                       

                      I can only get about 2 elections imported before the import gets VERY slow (can take up to 6 hours whereas the first one took 5 minutes).  by the 3rd election the whole thing crashes FM and I can get no further.

                      • 8. Re: When is a file too big and what do you do about it?
                        jmedema

                        Peter,

                         

                        User27087 brings up a good point - what kind of hardware are you running this on?

                         

                        Jim

                        • 9. Re: When is a file too big and what do you do about it?
                          gdurniak

                          You could also store the data in a "real" SQL database,  and either use it as an External Data Source,  or just query it for specific results ( it would be exponentially faster )

                           

                          greg

                           

                          > So my question is: Is this just too big a solution for FM to handle and if so (or not) how would you handle this ?

                          • 10. Re: When is a file too big and what do you do about it?
                            gdurniak

                            Just for info,

                             

                            We once put a Mac Mini head to head with an X Serve, and saw only a 15% difference in performance

                             

                            When FileMaker "hits a wall", new hardware won't help

                             

                            greg

                             

                            > what kind of hardware are you running this on ?

                            • 11. Re: When is a file too big and what do you do about it?
                              user27087

                              I had missed the "update matching" part. That is definitely the creeping speed issue. FileMaker (especial Server) will not have a problem with millions of records in the way you describe UNLESS you must use import & update

                               

                              Best bet, if you can do this, is to not import and match records, but simply import into an Election table. If you try to maintain a separate table of voters via import.update, you will have almost the very same speed issue (6M unique voters), so...

                               

                              ==> Simply import all records into the "many" table [Election Results] and determine if you even need the unique Voter table . If not, report on the one page table using sub-summaries.

                               

                              If you really need the Unique Voter table, then see what happens when you export/summarize the large (several election imports) Voter table. My guess is that it will be much faster than import and update.

                               

                              Scott

                              • 12. Re: When is a file too big and what do you do about it?
                                wimdecorte

                                gdurniak wrote:

                                 

                                Just for info,

                                 

                                We once put a Mac Mini head to head with an X Serve, and saw only a 15% difference in performance

                                 

                                When FileMaker "hits a wall", new hardware won't help

                                 

                                greg

                                 

                                > what kind of hardware are you running this on ?

                                 

                                Apologies for jumping on this but that is very sketchy information and it can't really help people make good hardware decisions.

                                Once = when?

                                What tests were done?

                                What type and model of processor in the Mac Mini?

                                What type and model of processor of the Xserve?

                                What tests were done?  Different actions

                                What type and speed of disk i/o in those?

                                 

                                When FM hits certain walls, adding hardware WILL help.  Depends on the nature of the problem.

                                • 13. Re: When is a file too big and what do you do about it?
                                  jfletch

                                  I'm going to guess they had an i7 Mac Mini. Single core performance is about the same as on a 2009 Xserve.

                                   

                                  Where the Xserve shines, though, is in the ability to have way more memory, and twice the cores. More cores means more threads. More threads means more PSOS and more users performing demanding tasks before the processors max out.

                                   

                                  And that doesn't even begin to account for the more reliable and redundant hardware: multiple PSUs and multiple NICs. The two card slots allow for more expansion: PCI SSD drives, more and faster NICs and SAN storage interfaces.

                                   

                                  Xserves are still the tool of choice for the all-Mac shop that needs more horsepower, without spending big bucks on new hardware. And it puts off just a little bit more the day when the in-house part-time IT person will have to learn Windows Server.

                                  • 14. Re: When is a file too big and what do you do about it?
                                    gdurniak

                                    If adding hardware helps, then the Hardware hit a wall,  not FileMaker

                                     

                                    greg

                                     

                                    > When FM hits certain walls, adding hardware WILL help

                                    1 2 Previous Next