1 2 3 Previous Next 43 Replies Latest reply on Jun 4, 2012 12:32 PM by disabled_jackrodgers

    Comparing Get ( UUID ) with existing solutions

    jbante

      I've been using my own UUID function for a while, so I was eager to see how it and other UUID solutions the FileMaker community has developed compare to the new Get ( UUID ) function. I put a test file together, and I encourage anyone interested to download it and run the tests for themselves in case someone runs into something I didn't, or in case someone just doesn't believe me. (I chose not to use any sub-second timing, like FMBench, for portability reasons, though I'm sure the results would be interesting. Here's hoping FileMaker 13 supports "SetPrecision ( Get ( CurrentTimestamp ) ; 6 )".) Here's what I found:

       

       

      1. Get ( UUID ) calculates very fast — its performance is similar to the empty custom function I used as a baseline to measure the overhead of running the tests.

       

       

      Good calculation speed is worth pursuing, but if you're using the value as a primary key in record data, it's probably less important. The value will only be calculated once for the life of each record, but may be referenced for finds and relationship matches several times and will contribute to file size over the entire life of the record. With that in mind...

       

       

      2. IDs that can be stored in number fields are consistently much faster to perform finds on than text fields. In my testing, at least twice as fast, and usually an order of magnitude faster.

       

       

      3. My test of the time to Count () related records matched on UUID showed no meaningful performance difference between the formats I tested. ExecuteSQL () similarly showed no meaningful difference.

       

       

      4. The IDs that can be stored in number fields result in file sizes slightly smaller than Ray Cologon's Base 36 solution (despite the number values being longer), which is significantly smaller than a file using Get ( UUID ).

       

       

      I'm pleased that FileMaker introduced a Get ( UUID ) that developers can standardize on; but without a change in the performance characteristics of text fields (required for storing the value) compared to number fields, I'm considering sticking with numeric UUIDs.

       

       

      A Get ( UUID ) value can be converted from base 16 to base 10 for storage in a number field. This completely elimates the calculation speed advantage of Get ( UUID ), but I did say above that this may not be the top priority. Find performance and file size for this converted value are modestly better than for my own numeric UUIDs. (The value is a couple digits shorter, which I'm presuming makes indexes a little smaller.) This doesn't contain any information value from some of the other functions, but that's better for certain applications anyway.

        • 1. Re: Comparing Get ( UUID ) with existing solutions
          HOnza

          Jeremy, that's very useful analysis.

           

          If I have some spare time I would like to test your sample file to see if I can find a way to make usage of the Get(UUID) function more efficient but I am sure your test results are already very useful for many developers.

           

          Thanks for sharing.

           

          HOnza

          • 2. Re: Comparing Get ( UUID ) with existing solutions
            jbante

            I came up with a calculation that will produce a value equivalent to converting Get ( UUID ) from base 16 to base 10, but much faster, for anyone interested in a standards-compliant-yet-performant solution:

             

            Floor ( Random * 2^48 ) * 2^80

            + 302231454903657293676544          // 4 * 2^76, version number for random UUID

            + Floor ( Random * 2^12 ) * 2^64

            + 9223372036854775808          // 2 * 2^62, reserved bits, indicating UUID scheme

            + Floor ( Random * 2^62 )

             

            If you need to convert this back to the hexadecimal representation, just use one of the many available custom functions for converting base 10 to base 16, pad it with enough zeros to make that result 32 digits long, and insert hyphens where appropriate. (The format is 8-4-4-4-12.)

            • 3. Re: Comparing Get ( UUID ) with existing solutions
              theo@tekainc.com

              Thanks Jeremy, appreciated your earlier cf's as weel. I'm thinking out loud, but I am curious if anyone has figured out if you can stuff a binary value into a field, and if Filemaker will create a value index. Should be faster than string compare, but perhaps not integer, which is why it would be nice to convert the number into a decimal value, and test the total time to create a uuid versus go to related on a key equijoin with different datatypes. Anyone?

              • 4. Re: Comparing Get ( UUID ) with existing solutions
                HOnza

                Jeremy, I may be just too tired, but where is the Get(UUID) being converted in your calculation?

                 

                HOnza

                • 5. Re: Comparing Get ( UUID ) with existing solutions
                  jbante

                  The calculation I posted doesn't convert Get ( UUID ); it's a faster-running short-cut that calculates a value equivalent to converting Get ( UUID ) without actually having to parse hexadecimal characters one-by-one.

                  • 6. Re: Comparing Get ( UUID ) with existing solutions
                    HOnza

                    What dou you mean by "value equivalent to converting Get ( UUID )"?  Do you mean that's what Get ( UUID ) does internally?

                     

                    HOnza

                    • 7. Re: Comparing Get ( UUID ) with existing solutions
                      jbante

                      Get ( UUID ) follows the RFC 4122 standard. The value I'm generating follows the same standard, but in decimal instead of hexadecimal.

                      • 8. Re: Comparing Get ( UUID ) with existing solutions
                        theo@tekainc.com

                        Get (UUID) appears to follow RFC4122 for mode 4 (random) as hex byte delimited format, but why are we out here scratching heads like a bunch of baffled chimps? Why not write it up in the documentation properly? When I wrote up my initial formal request to FMI for UUID (was it 4 or 5 years ago, let me see...), I asked for at least mode 4, but also to enable autoenter uuid. I asked that they pay special attention to data formats and interoperability with databases such as Oracle, which stores the autoentered key as a binary value. Also provided information and links how to obtain a uuid in either windows or macosx by calling os frameworks. I'm actually more impressed with Jeremy for his benchmark and custom function, which I have already deployed. Kudos and thanks for the teamwork!

                         

                        Theo

                        • 9. Re: Comparing Get ( UUID ) with existing solutions
                          HOnza

                          Thanks, Jeremy, now I understand. However, now I am more than ever hesitant to use this kind of UUID.

                           

                          To be honest, I have not studied the RFC4122 or any other UUID standards in detail. But I intuitively do not expect an algorithm based on 3 Random function calls to be 100% reliable. Sure, the probability of generating the same value twice is very low but not really zero. For this to be 100% reliable, the Random function itself would have to generate a universally unique number, and then it would make no sense to call it 3 times.

                           

                          So I personally tend ro prefer to use a scheme that's 100% reliable.

                           

                          Anyway, for those who are OK with the "mode 4" RFC4122 UUID, your calc together with the better performance of relationships based on number fields is definitely very useful.

                           

                          Thank you!

                           

                          HOnza

                          • 10. Re: Comparing Get ( UUID ) with existing solutions
                            pantarhei

                            Excellent info – thanks!

                             

                            Incidentally, only a couple of months ago I seemed to have a need for UUIDs, but it went away for the time being. One of these days I'm likely to need it, though. Not going to touch existing solutions, but there may be new ones coming. (Should I actually accept new FMP-projects, which is indeed questionable before the ”theme” situation is fixed.)

                            • 11. Re: Comparing Get ( UUID ) with existing solutions
                              jbante

                              For those uncomfortable with randomly generated UUIDs, there are also functions for UUID based on timestamp and device NIC address available from the usual sources and included in the demo file. No scheme is 100% perfect, though. With all schemes, duplicates are possible, just astronomically unlikely. For what it's worth, version 1 (timestamp & NIC address) UUIDs are easier to duplicate on purpose than version 4. I imagine random UUIDs were adopted as the most popular version out of a combination of privacy and performance concerns. There are other versions using cryptographic hashes (version 3 with MD5 and version 5 with SHA-1) that make it difficult to reverse-engineer any original data, but any cryptographic algorithm is relatively slow by design.

                              • 12. Re: Comparing Get ( UUID ) with existing solutions
                                theo@tekainc.com

                                UUIDs are especially used for 'partitioned' datasets, where mobile and other offline users need to create records and later sync to the main database with a guarantee of no key clashes. It's also useful when you need to combine different datasets. The timestamp version keeps the records in time-created sort sequence, which can be a plus, but as Jeremy mentioned, has the potential to reveal more info than some wish, which is why random is usually favored. FMP does not get proper precision timestamps, even though the datatype can hold the fractions of a second, they are not created by FileMaker, which is weird since they are available in any OS. So someone made a decision to lobotomize the timestamp function, which should be fixed. We need binary and interval datatypes desperately in FileMaker!

                                • 13. Re: Comparing Get ( UUID ) with existing solutions
                                  pantarhei

                                  In my main line of business, BIM (Building Information Management), UUID is the standard for idenitifying each and every component of a building. OK: we may not identify every hinge, lock, stock & barrel, but not far from that. Even in a moderate project there are hundreds of thousands of components; in BIM-based facilities managements systems easily tens of millions.

                                   

                                  Yet to hear of problems related to this, but this is cutting edge stuff, implemented by only a few significant bodies (ie. Gov't agencies).

                                  • 14. Re: Comparing Get ( UUID ) with existing solutions

                                    A time stamp is not guaranteed unique as it can be duplicated easily. Set a record timestamp field to be the timestamp and show it in a table view. Hold down the Cmd/ctrl  + N key and see how many records have the same timestamp.

                                     

                                    I tried NIC (now Persistent ID) + timestamp + random for a more likely unique id since the timestamp can be duplicated.

                                     

                                    All such attempts to create a unique id out of thin air cannot be guaranteed unique and as we say in a poker tournament, 1 in a 1000 happens. Murphy would say at a time when it will cause the most damage. And as we remarked 20 years ago, a $1000 mother board will blow up to protect a $.05 fuse.

                                     

                                    I discovered in 1986 that a set of 50 'random' numbers in the Tandy 1000 were used to pretend to be a random number. One of them was randomly selected.

                                     

                                    36 to the 32 power is a lot of possible combinations...

                                     

                                    My preference would be to use Filemaker's tool so that I would not be blamed if a duplicate appeared in my own private calculation...

                                     

                                    Of course it is presently unproven in the wild and we are the beta testers, again.

                                    1 2 3 Previous Next