High Performance Text - List of Values

Idea created by Chris Irvine on Apr 26, 2016
    Active
    Score11

    Making internal optimizations to the way text is stored and accessed could significantly improve FileMaker's text processing performance.

     

    FileMaker's seeming evolution of Value Lists and text functions have codified common ways for dealing with n-chunks of data, aka list of values. Patterns like what is seen here are not uncommon:

     

    Set Variable [ $items ; Value: List ( "apple" ; "orange" ; "banana" ) ]
    Set Variable [ $second_thing ; Value: GetValue ( $items ; 2 ) ]
    Set Variable [ $items ; Value: List ( $items ; "kiwi" ) ]
    Set Variable [ $next_to_last ; Value: GetValue ( $items ; ValueCount ( $items ) - 1 ) ]
    

     

    Unfortunately, as many people have learned, the performance of these structures degrade as the size of $items increases. Lets change that!

     

    What if the performance of appending a single element to a large list of items took the same amount of time as a small list of items? What if accessing a value with GetValue() was unaffected by the number of items, or the index of the element retrieved?

     

    I picture a system where the performance of these functions is similar for a variables containing 100 values(lines), 1k values, or perhaps 100k values.

     

    There are probably at least a dozen academically interesting data structures that could be applied internally to help with this problem. I'm thinking something as simple as a B-tree might provide huge gains for many of these functions. With this specific approach, a block of text arriving for the first time into a variable, would be scanned for line breaks and then broken up into a B-tree for storage. (Yes, this might add a little time compared to the current process.) The GetValue() function would traverse the tree to retrieve the correct item. Maybe "copies" could be performed quickly using pointers and reference counters.

     

    To be clear, I'm not asking for any new functions or publicly available data structures. I'm just asking that things get way faster when dealing with large text variables.