Just For Fun:  A Gimmick To Scrape the Current Found Set

Document created by steve_ssh on Aug 13, 2018Last modified by steve_ssh on Aug 27, 2018
Version 12Show Document
  • View in full screen mode



Over the last few days I had fun playing around with a technique for gathering data from the current found set into well-formed JSON.  This post shares some of that experimentation.





In the Community, there has already been considerable discussion and exploration of the topic of gathering found set data through a variety of means.  However, to the best of my recollection, this particular technique has mostly been unmentioned.  I describe it as a "gimmick" technique, because it is novel, and a bit unconventional -- also, because I think that most of us will choose to do our found set scraping via other means, and so it seems like something that may not warrant side-by-side status with other more serious tools in our toolbox.


I made a demo file and video to share with a couple of developer friends with whom I enjoy exchanging all variety of innovation platform concepts and ideas.  I initially intended to refrain from posting the idea here in the Community, but then changed my mind, thinking that perhaps some value may come of sharing even the gimmick ideas.





This is directly inspired by an idea that I saw here on this forum.  In that thread, BruceRobertson posted about a technique involving using Replace Field Contents in a global field.


The Inquiry:


The inquiry that motivated this was to see what sort of possibilities the global-field-replace technique could offer with respect to gathering found set data into an arbitrary JSON structure without any need for pre-calculated JSON fields defined in the schema.  Of particular interest to me was the fact that, in contrast to walking the found set using GetNthRecord, running Replace Field Contents offered me an obvious way to reach out to related records from the context of each individual record in the found set ( something which had eluded me when using GetNthRecord ). In this way, the technique is similar to walking the found set using Go To Record, only, rather than being driven via looped scripting, it is driven by a single Replace Field Contents script step.





I think that there is no reason to prefer it over any other method for scraping the found set into well-formed JSON, and arguably, there may be some reason to avoid it.  Indeed, a scripted alternative better lends itself to debugging and cancelling, and a List or a List-Of Summary of pre-calculated JSON fields will perform much better.  If it does have a place in our bag of tricks, the sweet spot would be with small record sets.  All that said, this is a fun one, and I hope that some other developers will enjoy seeing it.


Thanks also go to Brian Dunning.  I used his free 500 record sample data file to create the demo file, along with a couple of other free data sources which I found courtesy of Google.


Elaboration follows.  Thank you for taking a look.




Schema requirement is a single global text field added to the table underlying the target context:

simple graph.png


The work of gathering the data from the records and related records of the found set takes place in Replace Field Contents script step:

one replace command.png

Two custom functions tidy up the technique:

two custom functions.png


Watch it in action and/or try it out:


• Sample file should be attached to this document.

• A video introduction is posted on YouTube.




Various observations, critique, pros/cons, etc.:


I definitely find the technique alluring, but I also feel that just about anything we can do with this technique, we can also do with a script that loops over the found set, and the conventional/standard nature of the script loop approach is more comforting to me -- especially in cases where someone else may inherit the code.  Below are some of the considerations that I have been looking at, with a little bit of commentary, which I hope may be helpful.


Consideration     Comment
Data sizeThis technique is best suited to smaller-sized found sets.  Related to the fact that this technique uses a single  $$_global_variable to build up data, and updates that variable with many iterations, it does not gracefully scale to large data sets.  I did experiment with variations that attempted to optimize by chunking data into multiple repeating variables, but I was not satisfied with the outcome.  I may make another attempt to try to optimize in this way, but I am not convinced that it will be a success.  BTW:  Credit goes to either JohnSindelar or jbarnum for explaining the performance hit related to continuously updating a variable with a large sized string.  I can't remember which of you made the point about an under-the-hood implementation which uses a non-mutable string object, and I'm not even sure if that explanation is still accepted as the correct one, but it has been very helpful to me to consider.  Thanks!
Minimal schemaThat this requires minimal added schema is a definite plus for me.  It would be even better if we could use a global text field located from any context -- then we could simply re-purpose the same field everywhere, but my investigations lead me to believe that the global field must be in the target context.
GrokabilityI am a bit concerned that tucking the JSON structure into a custom function which has field references in it will be too much of a code labyrinth.  It's definitely a departure from the norm, where custom function output is typically state/context independent -- and that could potentially lead to misunderstandings or confusion. On the positive side, configuring the technique requires no specialized learning beyond familiarity with the commonly used JSONSetElement function.  Note that I did experiment with other approaches which did not require the RowJSON CF, but the best balance I found between elegance and performance was the version posted here.
FlexibilityHaving critiqued the CF approach (see previous comment), I will say that I do like how it affords me the ability to update the structure without waiting for a table to recalculate/update a stored JSON field.  This, however, would be true with many other harvesting approaches, as well.
One-shot processBecause the Replace Field Contents runs as a "one-shot" process, we lose the advantages of a scripted loop which affords us the use of the Script Debugger.
Well-formed JSONNo real qualms here.  It uses JSONSetElement to build up the JSON objects, so the encoding and format should be fine.  The small exception is that for the variation that returns a dictionary, it does make some assumptions that the key values don't contain any special characters.  If that feature starts getting use, it would be best to look at tightening it up to ensure escaping.  It might already be ok, but it's something that would warrant review.
PerformanceI found that on small record sets, this technique seems to perform as well as a script that loops over GetNthRecord.  As the record count grew, a GetNthRecord loop performed better, but not so much that I would rule out the Replace Field Contents technique -- they still felt comparable.  Neither of these two techniques, however, can come close to the performance of pre-storing some calculated JSON in each record.  That's the trade-off that I see:  One either stores more data in the record, and gets super-fast harvesting, or one keeps the schema leaner, in exchange for greater time-overhead during a harvest.
VersatilityWhereas other techniques lend themselves to a variety of data-harvesting scenarios (e.g. harvest data related to just a single record, SQL use, etc.), I presently see only one scenario to which this technique nicely lends itself:  The current found set.  This is, however, an important scenario to be able to handle.
Related DataIt's been a while since I've used GetNthRecord in a looped situation to harvest data from a found set, but my recollection is that I concluded that if there is a way to include related data (data related to each individual records in the found set) that I just wasn't seeing it.  In contrast, this technique is able to grab such data in an obvious/clear manner.  Note that this is also something afforded by scripting a loop to walk through the records using Go To Record (Next).
Script TriggersAside from the small schema footprint, one of the alluring aspects for me was that this could be used without triggering an On Record Load script trigger that one might otherwise have to dodge if using a Go To Record approach.
User ExperienceWhen the Replace Field Contents starts to take longer than a brief moment, it shows a progress bar dialog, and announces that it is updating or replacing data.  I have mixed feelings about this:  As a user, I would find a progress bar more comforting than just a beach ball, but the message about updating data would probably confuse me.  Also note:  If Allow User Abort has not been turned off, the user would have a tempting button staring them in the face asking if they want to cancel the operation.
Misc.I did some research this morning, and FMI has confirmed that the Replace Field Contents script step is gluten-free.
11 people found this helpful