Deferred & multithreaded cache updating for bulk operations

Idea created by Vincent_L on Jul 7, 2018
    Active
    Score0

    Hi,

     

    Caching plays a big role in FileMaker. It makes quick searching possible, and allow relationships.

    Therefore, if a field is indexed, FileMaker has to update its index if there's a change.

    That cache updating process causes a lot of slowdowns for bulk operations when there are multiple records that get updated. This is especially the case in imports (replaces, new record + set fields…) that can get very slow.

    That slowness not only can make the user wait, but it can also still resources from the server and hence slow it down and its users.

    Moreover, it may also lock the table, or at least make the table reading not as quick for other users.

     

    So cache updating is a big performance concern in FileMaker.  But it could be much faster :

     

    Apparently, the caching update is triggered by field and per records, one by one. That means that if you update 1 thousand records, with 5 fields updates, there will be 5000 cache update. ANd it seems to be mono-threaded, so only one CPU core will update the cache of field 1, then field 2, then 5, then next record an again.

    This also means that if you've only 5 different values in field 1 for your 1000 records, the cache updating will do 1000 lookups just to look if this value already exists.

     

    So the proposed idea is, for bulk operation, do to all the cache updating in one go, rather than one by one, and also span each field update in its own thread when possible (not for stored calc fields).

     

    There could be plenty of ways to optimize this leveraging the fact that it would happen whenever all the record are known, I'll depict some below that could be the fastest (and would have so big perks also).

     

    Please let's not debate about the method FMI will find one.

     

    Perhaps the simplest way to implement :

     

    Do the bulk operation without caching, then afterward launch the cache update upon the fields that have to be indexed.
    To do so, load all the data of the indexed fields of the all the records concerned in memory*, span each column (ie field + recordID) in its own thread (so that could leverage multiple CPU cores).

    For each column, only keep distinct value and their recordIDs (maybe if we sort the column first it could be faster because it won't need to scan all the records for all ≠ values). Then finally update the real cache with the new values.

     

     

    One other way would open the door of full transaction support for FileMaker :

     

    Don't do anything before receiving all the data (import or set fields). Just gather all data in memory*. That would also speed up the data generation as it won't have to really write it.

    Then compute the cache for each column in separate threads. Then when everything is computed, write it in the FileMaker file, and update its' cache in one go. If it fails for some reason. Abort, or optionally (if the user chooses to),  only write the records that could be written and warn the user about the records (with recordID lists) that encountered issues.

     

    This would speed up countless FileMaker operation by removing a big performance roadblock of the platform.

     

    On the topic of caching, you may also be interested in this idea that deals about records deletions :

    Faster Records deletion, possibly by deferring their index deletion and related deletes optimization

     

     

    * Don't worry nowadays there's more than enough free memory. And if not it could rely on OS virtual memory, or predict the memory usage beforehand and revert to old method if not enough. But VM will handle it nicely.