8 Replies Latest reply on Mar 28, 2017 6:59 AM by srzuch

    What is best way to process large data files (many GB).

    suesaunders

      I'm processing large financial data files.  We have about 600GB of text files (tick data) that are being imported into around 32 FileMaker files.  The next step is to merge the FileMaker files into several composite files (around 8) which are then each processed using different trading strategies.

       

      We have a high end machine operating Windows 10 but note that FileMaker only uses one of the 16 cores available which means that it is taking forever to process the data particularly the imports.  Is it possible, in some way, to have multiple copies of FM client application open, each processing data and each using one of the available cores.  The operation system would allocate one operation to each core. It is frustrating that we have plenty of processing capacity but do not know how to make use of it.  Each FM client application is well able to handle the large files but in order to process all the data in a realistic time frame, I need to come up with a better solution.

       

      I have a licence for 5 instances of FM and have FM Server on another machine.  However, the data is on the Windows machine and it is by far the faster beast-y that we have.

       

      Sue Saunders

        • 1. Re: What is best way to process large data files (many GB).
          bigtom

          FMP runs as a single thread/core application.  FMS is multi thread capable and will use multiple cores.

           

          How you manage multiple clients processing or using FMS to do the processing really depends on what you are doing and how you are doing it. Some things on FMS will still be single threaded. If you host the files on FMS and can have multiple clients processing different data sets at the same time that will speed things up.

           

          Building a direct server side process or running multiple PSOS calls really depends on what you are doing.

          1 of 1 people found this helpful
          • 2. Re: What is best way to process large data files (many GB).
            gdurniak

            We have a similar "problem", over 600G of data, and growing, in about 16 Tables

             

            Since our data is numbers, and FileMaker is poor at number crunching, we use Sybase SQL Anywhere, which is exponentially faster

             

            We then use FileMaker as a "front end",  building queries "on the fly", to get the results we need ( with a JDBC Plugin )

             

            If a compatible flavor, SQL can also be a seamless External Source

             

            For special processing of any data we do store in FileMaker, we import from server, to files stored and run on each Desktop. We then run multiple reports simultaneously ( which can take up to 3 hours each ), on multiple machines.

             

            greg

             

            > Each FM client application is well able to handle the large files but in order to process all the data in a realistic time frame, I need to come up with a better solution

            1 of 1 people found this helpful
            • 3. Re: What is best way to process large data files (many GB).
              wimdecorte

              You may want to look into pre-processing the text files outside of FM using OS-level tools.  Those are going to be much faster.  Then only import the finalized data into FM.

               

              Or use another supported SQL backend database and use ESS to work with it.

              3 of 3 people found this helpful
              • 4. Re: What is best way to process large data files (many GB).
                fmpdude

                I agree with Wim here.

                 

                In a recent item here on the forums, I was able to process about 540,000,000 (Million) records with 18,000,000,000 (Billion) calculations -- including disk access and writing an output text file for all 540,000,000 records -- in about 2 minutes using Java. In FileMaker this same task took well over an hour.

                 

                While FileMaker is excellent for many things, huge loops with lots of data isn't in my view one of them. Since it's clear FMP scripts are not compiled, they run very slowly for large tasks - especially with loops. The Java code I wrote to help the OP was only about 100 lines of code and did most of the calculations at memory speeds. Even when updating the actual back-end table (all 540,000,000 records AND writing the text file too) the Java code then only took 30 minutes.

                 

                Once you get the data the way you want, export it to CSV (still outside FMP) and then import it into FMP where you can take advantage of FMP's strengths including data management (native Finds, GTRR, etc.).

                 

                I'll try to help if I can.

                3 of 3 people found this helpful
                • 5. Re: What is best way to process large data files (many GB).
                  suesaunders

                  Thanks All for the help. 

                   

                  It seems that we have found a work-around.  Using the network, we have put 3 macs to work and 1 pc all working off the big machine's HD.  Although not ideal and each import is taking forever, for the time being we are making progress.

                   

                  So far we have found that the Windows machine stops importing after a while for no apparent reason.  But the Macs don't mind and although slow just keep on going until all the data is imported. 

                   

                  It is all puzzling.  I am only importing into 3 fields.  I have only 2 calc fields.  Simple little script that essentially says "import".  I would have thought that FileMaker would eat it!

                  • 6. Re: What is best way to process large data files (many GB).
                    gdurniak

                    Can you please clarify ?

                     

                    Do all 4 machines open a file hosted on server,  then do simultaneous imports ?

                     

                    That won't work,  and will take forever

                     

                    For Bulk imports,  we use the Admin Console to close the Target file,  then move the file from server to a "local" workstation, and do the import there.  When done, we move the file back to server ( "Local" processing is much faster than client-server )

                     

                    Otherwise, the bulk import overwhelms server,  and all users see horrible performance until it is done

                     

                    greg

                     

                    > Using the network, we have put 3 macs to work and 1 pc all working off the big machine's HD

                    • 7. Re: What is best way to process large data files (many GB).
                      suesaunders

                      Just to clarify.  I was not using FMS.  All imports were happening between local machines via the local network.  its just that the data resided on a Windows machine hard drive and the multiple Macs were accessing that hard drive for the imports over the network.  All worked well in the end but each import took many hours.  Maybe there was a better way but the job is done now. 

                       

                      Thanks for your suggestions though.

                      Best regards

                      Sue

                      • 8. Re: What is best way to process large data files (many GB).
                        srzuch

                        Maybe I missed it, but what type and how fast is your hard disk drive?  You are reading and writing a lot of data, and clearly pushing the disk drive.  Upgrading to a fast SSD may be worth it.

                         

                        Also, generally, proccessing over a network connection, especially peer to peer is much slower than doing all the work on one machine. 

                         

                        Steve

                        1 of 1 people found this helpful