11 Replies Latest reply on Sep 3, 2016 11:51 AM by Henry_2

    Record creation, only if it does not yet exist

    Henry_2

      Hi,

       

      In a record creation script I'm unsure how to prevent looping through records it has already created (preventing that would save a lot of time). Hope someone is able to help out:

       

      I have these five tables (altered a bit for clarity):

      1) stores (eg 'english store' and 'dutch store')

      2) products

      3) images

      4) product_image_join (key fields are: productId, imageId)

      5) product_image_join_store_specific (key fields are: productId, imageId, storeId)

       

      Now for each record that exists in table 4 I want to create records in table 5, for each store. For example because the records would hold info whether to show the image for the store or not.

       

      What does succeed?

      I can create a script looping all records of table 4, and then for each store creating records in table 5. I currently create these records using table relations (connected three global fields of a selector table with the three key fields of table 5), I believe the method I use is also known as magic key. That runs reliable.
      Unfortunately it will loop all, also those already created, simply because I don't check at all which still need creation or not.

       

      My challenge

      How do I check which have already been created and only create the new ones?

       

      I was thinking about doing a sql query that would check which combinations do not yet exist and then create only those, I have not yet figured out just yet which steps to take exactly to get this working.

       

      Hope someone is willing to help me out,

       

      Thanks,

       

      Henry

        • 1. Re: Record creation, only if it does not yet exist
          erolst

          Henry_2 wrote:

          I can create a script looping all records of table 4, and then for each store creating records in table 5. I currently create these records using table relations (connected three global fields of a selector table with the three key fields of table 5), I believe the method I use is also known as magic key. That runs reliable.

          Unfortunately it will loop all, also those already created, simply because I don't check at all which still need creation or not.

           

          All you would have to do is set the keys to create a new record, as you do now, then check if you can read any values via a relationship** (usually the primary key is used for that, but any of those three fields should do, since by definition they should not be empty).

           

          If you can't, create the record, else don't. So instead of writing unconditionally, wrap your creation code into an If[]:

           

          If [ isEmpty [ ImageJoinStore_checkTO::anyNonEmptyField ) // that field combination, i.e. record does not exist ]

          # [ creation code ]

          End If

           

          **You'd have to create a new relationship that uses the three global fields, rather than the MK field.

           

          You could also create a calculated key from the three fields, then get a List() of all keys, then use a loop and FilterValues () to check a calculated value from the three globals against that list. Might be faster because there wouldn't be a field access per iteration.

          • 2. Re: Record creation, only if it does not yet exist
            Henry_2

            erolst wrote:



            All you would have to do is set the keys to create a new record, as you do now, then check if you can read any values via a relationship** (usually the primary key is used for that, but any of those three fields should do, since by definition they should not be empty).

             

            If you can't, create the record, else don't. So instead of writing unconditionally, wrap your creation code into an If[]:

            Thanks, that would be more best practice. It might not be the most time consuming part though.

             

             

            erolst wrote:

             

             

            You could also create a calculated key from the three fields, then get a List() of all keys, then use a loop and FilterValues () to check a calculated value from the three globals against that list. Might be faster because there wouldn't be a field access per iteration.

            Thanks, makes sense and would probably speed things up.

            You would however still be looping the entire list right, checking each against the existing list? Could there also be an even quicker way by creating a list of the records that need to be created (eg identified by 'calculated keys' also) and then subtract the list of already created records (again identified by their 'calculated keys'). Then only loop the remaining list for record creation. (Or any other way to avoid FM to loop each value.)

            • 3. Re: Record creation, only if it does not yet exist
              erolst

              You ill still need to loop through the stores, but you can forego the inner loop and the check like this:

               

              • create a global field in the first table (ImageProduct, IP).

               

              • in the ImageProductStore table (IPS), create a text field with a concatenation of your three IDs (make sure to use a delimiter!); define it as a regular field with an auto-enter calculation, so you can validate it as unique. (You need to re-inject these values for your existing records, e.g. using Replace Field Contents.)

               

              In your script

              • make sure there is an IP TO with a found set of all records (do this once outside of any loop)

              • set the current iteration's store ID to the global in IP

              • go to the second table and import the two real keys and the global store ID.

               

              The validation will reject, i.e. not allow record creation, for any combination of keys that would result in a duplicate.

              • 4. Re: Record creation, only if it does not yet exist
                erolst

                Also, you could use ExecuteSQL to get a list of IPS keys per current store, then get a list of IP keys that are not in the first list (unfortunately, there is no appropriate built-in function in FM – “Anti-FilterValues() …)

                 

                This (and other strategies) would btw. be less awkward if you would simply relate IPS to IP via a single IP primary / IPS foreign key, rather than repeat both the I and the P key.

                • 5. Re: Record creation, only if it does not yet exist
                  Henry_2

                  erolst wrote:

                   

                  Also, you could use ExecuteSQL to get a list of IPS keys per current store, then get a list of IP keys that are not in the first list (unfortunately, there is no appropriate built-in function in FM – “Anti-FilterValues() …)

                  Thank you for your solutions! Learned from your import strategy example, I'm now back to my initial strategy using ExecuteSQL. I now get to the right list of IP keys (per store loop) that need creation in the IPS table. I do still have a question though, how to move to record creation from this.

                   

                  Brief summary of what I did: I tried first without a composit key (is composit key what it's called?):

                  SELECT (productId||'__'||imageId) as composit

                    FROM IPS

                    Where storeId = '" & $storeId & "'

                  That took +-25 seconds for getting a selection of 330.000 records, it took +-7secs after adding the composit key natively in the IPS table (auto-enter).

                   

                  Then getting the IP keys could also be like this I think:

                  SELECT (productId||'-'||imageId) as composit

                    FROM IP

                  WHERE composit NOT IN ($ipsArray)

                  (or using composit key within the table again, for performance)

                  EDIT: it seems the 'NOT IN..' is taking forever (30 mins and counting) when using the script on the full set, any suggestions?

                  The question below is only relevant if the ExecuteSQL performance issue can be fixed.

                   

                  Now that brings me the list of IP keys to be created.

                  How would you subsequently move forward to the record creation? Since I would still need the three real keys for record creation in IPS.

                   

                  I see some options:

                  option A: Would you adjust the sql query above like this:

                  SELECT (productId||'-'||imageId) as composit, productId, imageId

                    FROM IP

                    WHERE composit NOT IN ($ipsArray)

                  and then get the second and third column (productId, imageId) with a (custom) function, eg while looping the list for record creation? I'm unsure how best to get productId and imageId from each value in the list, I presume I can easily find an answer though.

                   

                  option B: in a record creation loop fetch the productId and imageId by looping all IP records (selecting the record by composit id) and subsequently for each create the record in IPS, using the storeId (from current store loop iteration), productId and imageId. Downside here I think would be accessing all relevant IP records again.

                   

                  Perhaps a better option C?

                   

                  Thanks,

                   

                  Henry

                   

                  PS: I am using separate keys in IPS, because there might be record creations in IPS from an external source, which would only know the storeId, productId and imageId. It might not know of any id's in IP. So at this time I think keeping the multi keys would be safest, or spend some time understanding impact on other area's

                  (edited the PS)

                  • 6. Re: Record creation, only if it does not yet exist
                    erolst

                    I'd say create a composite IP key in both IP and IPS because that should be faster and relaxes the script code.

                     

                    (I usually define an array delimiter (say '|', or '§§§") as a Custom Function; if you reference that in both the field calculations and the script code, you don't have to memorize it.)

                     

                    Now you could use

                     

                    # [ get store key list, set store count ]

                    # [ outer Store ]

                    Loop

                      Exit Loop if [ Let ( $storeIterator = $storeIterator + 1 ; $storeIterator > $storeCount ) ]

                      Set Variable [ $currentStoreKey ; GetValue ( $storeKeyList ; $storeIterator ) ]

                      Set Variable [ $currentIPlist ;

                       ExecuteSQL ( "

                         SELECT compositeKey

                         FROM IP as ip

                         WHERE ip.compositeKey NOT IN

                           ( SELECT compositeKey FROM IPS as ips WHERE ips.storeID = ? )

                        " ; "" ; "" ; $currentStoreKey

                        )

                      ]

                      If [ not IsEmpty ( $currentIPlist ) ]

                        Set Field [ yourStoreGlobal ; $currentStoreKey ]

                        Set Variable [ $IPiterator ; 0 ]

                        Set Variable [ $currentIPcount ; ValueCount ( $currentIPList ) ]

                        # [ inner IP list ]

                        Loop

                          Exit Loop if [ Let ( $IPiterator = $IPiterator + 1 ; $IPiterator > $currentIPcount ) ]

                          Set Variable [ $currentEntryAsList ; Substitute ( GetValue ( $currentIPList ; $IPiterator ) ; ArrayDelimiter ; ¶ ) ]

                          Set Field [ yourImageGlobal ; GetValue ( $currentEntryAsList ; 1 ) ]

                          Set Field [ yourProductGlobal ; GetValue ( $currentEntryAsList ; 2 ) ]

                          # [ Magic Key creation code goes here ]

                        End Loop

                      End If

                    End Loop

                     

                    As usual, lots of lists to keep track of …

                    • 7. Re: Record creation, only if it does not yet exist
                      Henry_2

                      Thanks a lot, I was doing something like this part:

                          WHERE ip.compositeKey NOT IN

                             ( SELECT compositeKey FROM IPS as ips WHERE ips.storeID = ? )

                      Using a variable, eg: NOT IN ( $ipsArray), that query however seemed to take forever to complete, I killed FM after an hour or so. Would you expect when using a subquery immediately will solve this? I'll try later.

                       

                      FYI: also tried AntiFilterValues cf, but also for that function the lists seem to be too large.

                      • 8. Re: Record creation, only if it does not yet exist
                        Henry_2

                        erolst wrote:

                        ...

                        ExecuteSQL ( "

                        SELECT compositeKey

                        FROM IP as ip

                        WHERE ip.compositeKey NOT IN

                        ( SELECT compositeKey FROM IPS as ips WHERE ips.storeID = ? )

                        " ; "" ; "" ; $currentStoreKey

                        )

                        ...

                        Tested with this (slightly different as I'm used to create the query in a Let), it however takes again way too much time, currently running for 30 mins and counting. IP has about 150.000 and IPS about 300.000 records. CPU seems busy. Do you think it should perform well with these amounts of records, any idea?

                         

                        While learning a lot (big thank you), it seems I have to get back to one of the other strategies. Eg.

                        A) using your import suggestion, perhaps add some logic to constrain the found set of the import source table first, for performance, to avoid 100.000's of records to import each time.

                         

                        B) I'm thinking about my initial setup, looping all stores and IP, but then like this:

                        first add a field to the IP table, eg 'recordsCreatedForStoreIds' which will contain value list of the storeIds already created a record for in IPS

                        Then in the script I would:

                        1. constrain found set (or use sql query) of IP records, to all records where store of current loop is not yet present in field IP::recordsCreatedForStoreIds, then:
                        2. loop all IP records, for each create IPS record for store of current loop.
                          After each IPS record creation also add the store to the values in IP::recordsCreatedForStoreIds 
                          (I might need to do some thinking how to add the value, without the need to read the current contents, perhaps add storeId¶, or ¶storeId always, even though there might not be values in front or after)
                        • 9. Re: Record creation, only if it does not yet exist
                          erolst

                          Henry_2 wrote:

                           

                          I might need to do some thinking how to add the value, without the need to read the current contents, perhaps add storeId¶, or ¶storeId always, even though there might not be values in front or after)

                          That's what List() is for.

                           

                          You could also try this import strategy:

                           

                          Create a relationship from your Utility table (or from wherever you run this all) to a new IPS TO via the global store key, and from there to a new IP TO via both the I and P key.

                           

                          Perform a Go to Related Record, matching only, from your table to IP (via IPS, using the entire found set option in GtRR). Now there are three scenarios: import all, none, or some.

                           

                          1. If there are no related records in IP (you need to trap for an error here), that means there are no related records in IPS, either, so you must import all IPs.

                          2. there are related records – but those you already have. So compare this found set with the total record count

                          a) the two counts are identical; you're done (for this store), because you have all IPs in IPS

                          b) they're not; perform a Show Omitted, and import that set.

                           

                          In both import scenarios, set the current store key to a global in IP so you can import it.

                          • 10. Re: Record creation, only if it does not yet exist
                            Henry_2

                            erolst wrote:

                             

                            Henry_2 wrote:

                             

                            I might need to do some thinking how to add the value, without the need to read the current contents, perhaps add storeId¶, or ¶storeId always, even though there might not be values in front or after)

                            That's what List() is for.

                             

                            of course funny how I can get into thinking too much.

                            • 11. Re: Record creation, only if it does not yet exist
                              Henry_2

                              erolst wrote:

                               

                              ...

                              (I usually define an array delimiter (say '|', or '§§§") as a Custom Function; if you reference that in both the field calculations and the script code, you don't have to memorize it.)

                               

                              ...

                              Set Variable [ $currentEntryAsList ; Substitute ( GetValue ( $currentIPList ; $IPiterator ) ; ArrayDelimiter ; ¶ ) ]

                              Set Field [ yourImageGlobal ; GetValue ( $currentEntryAsList ; 1 ) ]

                              Set Field [ yourProductGlobal ; GetValue ( $currentEntryAsList ; 2 ) ]

                              ....

                              Helped me a lot, eg when I SELECT multiple columns (using ExecuteSQL), to get the individual column values per loop iteration. Previously I split up the columns before the loop, but that used slow custom functions.