ForEach[Values] new script step for major performance boost

Idea created by Vincent_L on Apr 26, 2016
    Active
    Score2

    Hi,

     

    I needed to process a 500K rows Value list. But, to my dismay the  code below, that I use in most of my scripts, is horrendously slow, with if my value list is more than a few thousands items.

    Worse, it's exponentially slower :

    20K rows list is 59 seconds, 40K is 5 minutes, 80 K is 19 minutes, 500K : more than 12 hours later it was only at 250K…

     

    Set Variable [$list_l; $valuelist]

    Set Variable [$l; Value:0]

    Set Variable [$Nbre_l; Value: ValueCount ( $list_l )]

    Loop

      Set Variable [$l; Value:$l+1]

      Set Variable [$myValue_l; Value:GetValue ( $list_l; $l )]

      # Put here the stuff you want to do

     

    whatever steps I need to do…

     

      Exit Loop If [$l ≥ $Nbre_l]

    End Loop

     

    Obviously, I first thought about asking FMI to accelerate drastically GetValue ( listOfValues ; valueNumber ), but even though there's certainly optimizations they can make, I now think the slowness is elsewhere.

    In fact, the above script is pushing 500 Thousands times a 500K row value list to the GetValue function. If each values is a 1K chars string : that's 500  * 500 * 1000 = 250  MB

    And then each time the GetValue function has to find the correct row out of 500K rows. That's a lot of overheard, and that's a lot of unnecessary  searches.

    This explains why the executing time is exponentially slower and slower as the value list is bigger.

     

    One way to speed things up is to cut the value list in chunks. But if I cut it in 1K rows chunks, it takes 1s / 1K chunk, that's still 500 seconds = 8"30 still a lot.

     

    The crux of the problem is that I just need (as well as all people wanting to process values), to iterate through the 500K values, just reading the next line till the end, I, and we in most case, don't need to read an arbitrary line, but just the next one. And I just need to load once the 500k in memory. But currently it's impossible to do in filemaker. Because using any function, passing to any plug-in, will mean to move the 500K rows * 500K times. So the overhead will be always there.

    What filemaker would need, is a way to iterate through the big value list, remembering it's position for the next iteration. That's what ForEach is for.

     

    So the proposed scrip steps would be

     

    ForEach[Values]

    set variable [$CurrentValue;Get ( ScriptResult )]  // of course we need a way to get the current row contents, I propose Get ( ScriptResult ) but could be a new function like Get(LastForEachResult)

    my scrip step 1

    mu script sept 2

    etc..

     

    End ForEach

     

    This would be tremendously more efficient, saving a lot of memory movements.

    This would benefit everybody, moreover its' much more clean and easier to understand than my previous code (forced is also very common in many languages so less steep learning curve).

     

    This is especially important for those using virtual list techniques, and those using ExecuteSQL, they both scream for that, and of course all people needing to process more than 10K values

     

     

     

     

    P.S : of course I tried a lot of alternatives, with Position etc, they're much slower