10 Replies Latest reply on Apr 19, 2013 10:25 AM by mikebeargie

    context-free data scrambler

    mikebeargie

      I'm looking into making a script that will go through an entire database, and replace all the data with random strings.

       

      I'm already able to do the replace function I want, making an alternative of this function http://www.briandunning.com/cf/533 that will replace numbers with random numbers, and text strings with random text (respecting spaces).

       

      but I'm still wondering if it's possible to make this context free and fast. I'd ideally want to drop this custom function and script into a file, run it, and have all the data in the entire file replaced with random text. Basically, I want to clone a file, but replace the data with random data for security purposes. Assume also I am running this on raw data layouts, or a data separation file.

       

      So I came up with something like:

       

      set var $iteration = 1

      LOOP

      go to layout by number = $iteration

      set var $fieldlist = FieldNames ( Get ( FileName ) ; Get ( LayoutName ) )

      set var $fieldcount = ValueCount ($fieldlist)

      show all records

      Go To Record first

      LOOP

      set var $subiteration = 1

      LOOP

      Set Field By Name - GetValue ($fieldlist ; $subiteration) = RandomizeIt(self)

      Set var $subiteration = $iteration + 1

      Exit loop if $subiteration > $fieldcount

      END LOOP

      go to record next, exit after last

      END LOOP

      set var $iteration = $iteration + 1

      exit loop if $iteration > layouts running this script on

      END LOOP

       

      Anyone have thoughts on this, or have you done soemthing similar to obfuscate your data? This is workable for me (more complex needed for error checking and such), but runs slowly due to the use of subloops, as you can't do a replace field contents command based on a calculated field name.

        • 1. Re: context-free data scrambler
          steve_ssh

          Hi Mike,

           

          That sounds like a really fun project.

           

          Have you considered doing a performance test between the methodology that you sketched out versus trying to do the same thing via SQL updates with a plugin (e.g. BaseElements plugin)?

           

          I believe that the FileMaker metadata tables should provide you access to the information that you would need regarding field names and types.

           

          At the end of the day, I believe you would still be performing an update on one record/row at a time, so the big questions as I see it are:

           

            - Can you tolerate using a plugin?

             - Which incurs more overhead?

             - Which is easier to do error detection?

             - Which is easier to customize?

           

          One thing that I find attractive about using the meta-data tables is that you should be able to detect an error condition whereby one of your data layouts was not updated after a new field was added to the schema.  The flip side to this is that using your layout approach allows you to selectively omit certain fields from the munge process, e.g. a field that validates against a value list of defined strings whereby the munge process would result in invalid data.

           

          I would be curious to find out what you settle on, should you be willing to keep us updated with your progress.

           

          Best and good luck!

           

          -steve

          • 2. Re: context-free data scrambler
            wimdecorte

            You could use a goto field to step through each field in the tab order, check the active field name against your list of fields to handle and do a replace "in place" without specifying a target field.

            • 3. Re: context-free data scrambler
              mikebeargie

              I had thought about that, and might be able to adjust accordingly. But I'm not confident that:

               

              1) All fields will always be on the layout

              2) The tab order will always be correct, and include all fields.

               

              IE, what do I do when there were fields in the list that were not addressed?

               

              In retrospect, I'm not terribly worried about how long it would take to run, it just seemed like a series of replaces is more graceful than going to every record, then every field, and using a set field step.

              • 4. Re: context-free data scrambler
                mikebeargie

                I thought about doing it "from the outside". but soon figured that there would be issues with certain things. IE how to handle containers, field repetitions, etc.. special filemaker things that sometimes act finicky in ESS.

                 

                Since there are a few more design functions in filemaker that are useful in this instance (IE FieldType() ), I think native filemaker is the best way to go.

                 

                As noted to Wim below, in retrospect I guess speed isn't necessarily that important to me, so I intend to build out the most robust script I can to handle all "variables" of data that could happen, and test extensively.

                • 5. Re: context-free data scrambler
                  wimdecorte

                  In that case you probably should use the FM metatables and build a data dictionary table (even if it just in memory) so that you can keep track of what fields have been set.  That way you can also process fields that are NOT on any layout but may contain data.

                  • 6. Re: context-free data scrambler
                    steve_ssh

                    Hi Mike,

                     

                    This makes sense.

                     

                    The one thing I would mention again, however, are the metadata tables.  Wim also mentioned this in his second post.

                     

                    What I am getting at is that I think the metadata tables would be a more reliable means for you to get your list of field names, versus using FieldNames.  If you are using FM12, then no plugin/ESS should be necessary to go this route.

                     

                    Possibly you could utilize the strengths of each of the methods for getting at your field names:

                     

                      - You could use layouts to determine the list of fields that will be processed (obtaining the list using FieldNames).  This would allow you a simple means for "customizing" which fields get processed, i.e. you just edit the layout.

                     

                       - You could use information from the metadata tables to consider all fields, and generate a report of, for instance, a list of all non-calculation fields which were not processed, which might make the system a tad more robust.

                     

                    Very best,

                     

                    -steve

                    • 7. Re: context-free data scrambler
                      mikebeargie

                      This sounds like the right path, thanks again Wim.

                      • 8. Re: context-free data scrambler
                        mikebeargie

                        Thanks Steve,

                         

                        I did want to make an FM11 compatible version of this, but it's not a make or break requirement.

                         

                        I'm not finding much on the use of metadata tables, could you point me in the right direction of a whitepaper or functions guide?

                        • 9. Re: context-free data scrambler
                          steve_ssh

                          Hi Mike,

                           

                          Absolutely:

                           

                          Andrew Duncan of Databuzz wrote a nice blog about this topic which is here:

                           

                          http://www.databuzz.com.au/using-executesql-to-query-the-virtual-schemasystem-tables/

                           

                          I think it's got everything you need to get up and running with this.

                           

                          Very best,

                           

                          -steve

                          • 10. Re: context-free data scrambler
                            mikebeargie

                            fantastic, thanks for all the help!