AnsweredAssumed Answered

Indexing very large datasets

Question asked by everyman on Nov 19, 2015
Latest reply on Nov 22, 2015 by everyman

What is the conventional (or unconventional) wisdom on importing very large datasets?

 

We have a database that grows by about 3.1 million records each year, once a year.  Much of this data (mostly, the text data) is fairly static and doesn't change from year to year;  I have put this into separate tables.  But the meat of the thing is a set of number fields and number calcs; we get this stuff about this time every year and import the new annual data.  For better or worse, almost all of these fields need to be searchable --- which means importing and indexing takes longer every year.

 

Are there any accepted "best practices" for importing large datasets?  I am importing (from an Access table) into a single-table "import" FileMaker database.  That file is set up as an external data source in my main database.  (Since the import table & data is only needed once a year, I moved it to a separate file --- deleting 3.1M records takes a while.)  The import routine uses lookups to create new records and will (if necessary) update existing records in the more static tables.


The basic structure is three tables: property, owner, tax assessment.  The assessment table has new entries for each year; each entry links to the property description and the owner information.


Thanks for any light you can shed on this.  We need to keep the import time as brief as possible.  (It generally takes on the order of a weekend to import the data.  I'd like to keep that fairly linear: if after a few years it begins to look like three days (and then four and five) I'll be back to keeping it in an annual file.

Outcomes