10 Replies Latest reply on Nov 3, 2012 8:17 AM by gdurniak

    How long will this take?

    dreed

      As I mentioned in a previous post, I've pulled down the entirety of our corporate analytical database from its hold home to create a filemaker database that will allow essentially read only browsing and searching of the data, since we've moved to a new system where all the old data was not completely mapped.

       

      The original SQL database consists of 57 tables, one of which "RQST_TREE" has about 800 GB of BLOB data spread out across over 1 million records. ODBC import was able to pull all this data into filemaker as container data, and exporting the field contents proves that the original documents are intact and readable in their source applications. Since this is a 800 GB Filemaker table (which I will keep as its own separate file), I would like to use external container storage to ease management of this table in filemaker server. I started the transfer process to put the containers in secure remote storage at noon today and about 8 hours later it's at maybe 2% based on the progress bar (screenshot). I know it's a lot of data but is this really going to take a couple weeks? If so I should probably quit the job and run it on another machine besides my laptop.

        • 1. Re: How long will this take?
          wimdecorte

          Nobody can tell how long it will take, it all depends on your hardware.  I certainly would not want to run this on a laptop.  You need a machine with lots of RAM and very fast disks to make this efficient.  A server probably.

          • 2. Re: How long will this take?
            dreed

            The machine itself is pretty fast.  has 16 GB RAM and an 8 core i7 cpu.  The disk where I have the files is a 4 disk RAID 5 array connected to the laptop by USB 3.  The concern is more that I will probably need to use the laptop for other things.  I was hoping it would only take 1-2 days and I could let it run over the weekend and have it all done....

             

            I guess another question is, if I host this basically read-only table (file) on filemaker server, the backup would likely take a really long time.  But since the data won't change, I guess I could just keep offline backups and not do server backups on the file.  If this is the case, in this case would there really be a reason to go through externalizing the containers?

            • 3. Re: How long will this take?
              CarstenLevin

              Having the files internal in the file will force a compleete 800GB backup if you change a comma.

              • 4. Re: How long will this take?
                gdurniak

                You may get the priize for largest FileMaker file

                 

                Just curious ... How long did the original import take ?

                 

                Strange, that the transfer to external storage would be so much slower.  Perhaps this external disk is the bottleneck

                 

                greg

                 

                 

                > "RQST_TREE" has about 800 GB of BLOB data spread out across over 1 million records.  ODBC import was able to pull all this data into filemaker as container data, and exporting the field contents proves that the original documents are intact and readable

                • 5. Re: How long will this take?
                  tech_liaison

                  Have you configured your solution to use Secure Storage?

                   

                  Best,

                  Dave

                  • 6. Re: How long will this take?
                    dreed

                    That is a very good point.  I do have it configured to use secure storage.  I bet the encryption is killing me.  I will restart the job with that turned off.

                     

                    The original import took somewhere between 25-30 hours over Gb network.  If I have to do this again, I'll probably set up the field default and storage options after importing a few records.  Then I can import the rest with the file already set up to do external storage.  That way FM won't have to do a huge file compact to recover all the empty space when it's done either.

                     

                    I'll keep you posted on the performance of this file, especially as it relates to using 360 works scribe to pull keywords from all the supported container data for searching.

                    • 7. Re: How long will this take?
                      MartinCrosman

                      How are you find Scribe's performance? Are you using it with native electronic documents or scanned PDFs with hidden OCR? Is it pulling the text as expected? Missing parts?

                       

                      Martin Crossman

                      • 8. Re: How long will this take?
                        dreed

                        Pretty much all the documents I'm working with are native electronic documents (word, excel, powerpoint, pdf, txt and some others)  so far the extraction of text content has been pretty good.  The problem I've found is with the scripting of doing this for the large amount of documents in the file, and the fact that the data imported don't have the proper document name and extension on the container.  Since the column type in SQL was varbinary, there is no name (All files get the name Untitled.dat), but the name and extension are stored in two other columns.  I've been able to export the container contents to a temp file with a calculated name, and then read the contents of the calculated doc with Scribe.  But I also have the script calling a send event to delete the document when it's done reading it.

                         

                        The problem is that document deletion calls a new process on every loop iteration, and it sometimes gets behind.  This has caused the export to fail, since there can be filenames with different record IDs that have the same filename. (I should probably concatenate the record Uid into the filename to make it unique) This or some other issue with calling so many external events caused filemaker to hang during the extraction.  And the consistency check on a 800 GB file takes a LONG time.

                         

                        I've also checked out using Scriptmaster to rename the filename within the container, and that works, but it doesn't really just rename it, rather it streams the binary data from the container and puts the stream back in the original field with a new name.  This will slow down the process as well.

                         

                        I figure the best approach is still externalizing the container data, so at least it will be a bit easier to recover if I do something that causes the indexing job to crash.

                        • 9. Re: How long will this take?
                          Stephen Huston

                          Secure storage is actually the faster option for this process. The "encryption" is not of the container data, but of the storage structure naming conventions. "Secure" also protects against the OS problems encountered if you try to save thousands of files into the same OS-level directory, which can grind this process to a halt even with just a few thousand records. Stay "secure."

                          • 10. Re: How long will this take?
                            gdurniak

                            It seems the data is encrypted by default

                             

                            e.g.  "Alternatively, you can choose to keep the data in its native format through open storage"

                             

                            The help article is here:

                            http://help.filemaker.com/app/answers/detail/a_id/10244/~/storing-container-field-data-externally

                             

                            greg

                             

                             

                            > The "encryption" is not of the container data, but of the storage structure