1 2 Previous Next 19 Replies Latest reply on Jan 16, 2016 7:44 AM by srzuch

    Best Practice: Code vs. ID?

    rmittelman

      I'm beginning the process of re-writing an existing Access application in FileMaker 14, and could use a bit of guidance.  The application manages student funding.

       

      Schools table has fields like Name, FullName, etc.

      "UCLA", "University of California, Los Angeles"

      "USC", "University of Southern California"

       

      Students table has fields like SSN, Name, etc.

      "123456789", "John Smith"

      "987654321", "Jane Doe"

       

      The first question is whether I should use auto-number IDs in addition to the Name and SSN fields?  While I don't really need them, would they help speed up relationships and downloading lists of records when I move the application to Server?  In case I use HyperList, would it function better using a numeric ID?  When I choose a school, the application would need to download a list of hundreds or thousands of students to display in the UI.

       

      The second question is about storage of calculated fields.  In what conditions is it better to not store at all, and when is it better to store them?  I'm guessing things like formatted phone numbers and formatted names or SSN's wouldn't need to be stored, right?  But what if I need to use a formatted field in a relationship or sort by it?

       

      TIA...

        • 1. Re: Best Practice: Code vs. ID?
          Mike_Mitchell

          You should never use anything that's user-editable as a key field. All it takes is for one user to make a change and the relationship breaks. That's why key fields should generally be invisible to the user and populated via automatic means.

           

          Unstored calculation vs. stored is essentially a matter of what you're going to do with them. If you're going to be doing any searching on them, they should be indexed for performance reasons - which means stored. Any field used as the target (child side) of a relationship has to be indexed, so it can't be an unstored calc. (You can use unstored calcs as key fields on the parent side.)

           

          All calculations bear a performance penalty. With stored calculations, you pay the penalty when it's calculated and stored (which is when anything it depends on updates). Unstored calculations update whenever they're called, like when they're displayed or used in another calculation or script.

           

          Based on these considerations, I generally avoid unstored calculations. When data can be stored, it should be. There are a few exceptions, like when they have to be unstored (due, for example, to referencing data in a related table). However, even in those cases, if searching or sorting is going to be done based on the value, it's often (maybe even usually) valuable to echo the data into the parent table rather than using an unstored calculation.

           

          In fact, I tend to prefer auto-enter calculation entry rather than calculation fields. While the performance penalty isn't really much different, it allows the user to edit the value after the entry (which can be useful, or a problem, depending on the workflow).

           

          HTH

           

          Mike

          • 2. Re: Best Practice: Code vs. ID?
            DanielShanahan

            • I second Mike's advice.  Create an ID field for your tables, even if one or more fields already have unique values.

             

            • Reducing un-stored calc fields will increase performance when querying those fields (e.g. total of an invoice).

            • 3. Re: Best Practice: Code vs. ID?
              rmittelman

              Thanks guys, for the quick advice.  I appreciate it.

               

              Regarding calculations:

              What about fields that almost never require recalculation?  I'm thinking of how I do phone numbers.  Unless FM came up with something that I haven't discovered yet, the nicest way to do phone numbers is have a Phone field and a PhoneFormat calculated field, which is right on top of the Phone field in layouts, and set as not being available and not a tab stop.  It displays the phone like "(888) 555-1212".  When I click on, or tab to the field, I instead land on the Phone field, where the number is entered like "8885551212". When I tab out of that field, the formatted number is displayed again.  It seems this field would only be recalculated when the record is displayed, right?  Oh-oh.. continuous layout would have to calculate all displayed phone numbers.  Maybe I'm worrying too much about saving space in the table...

               

              Regarding key fields:

              In my Access database, the user is not permitted to change the key field except in the master table itself.  I've set relationships to cascade the updates, so therefore they could change the school "code" if they wanted.  That being said, I don't know how that would work in FM Pro.  It sounds like you're both recommending using only auto-indexing fields as the key field, and then just display the school code or student SSN as needed, right?  Probably allow changing of those in the master tables, but not in any child tables.

              My question was more on the nature of retrieving data from the server, and relative speed between using text field for the key vs. number field.  I thought I read something about the latter being much faster, especially if using something like HyperList.

               

              Thanks again...

              • 4. Re: Best Practice: Code vs. ID?
                Mike_Mitchell

                For the phone number, I'd say do neither. Just have one field and use an auto-enter calculation that replaces the existing data. Here's one I use:

                 

                Let ([

                source = Case ( not IsEmpty ( phone ) ; phone ; Self ) ;

                numericOnly = Filter ( source ; "0123456789" )

                ] ;

                 

                Case ( Length ( numericOnly ) = 10 ;

                 

                "( " & Left ( numericOnly ; 3 ) & ") " & Middle ( numericOnly ; 4 ; 3 ) & "-" & Right ( numericOnly ; 4 ) ;

                 

                Length ( numericOnly ) = 7 ;

                 

                Left ( numericOnly ; 3 ) & "-" & Right ( numericOnly ; 4 )

                 

                )

                )

                 

                As far as numeric vs. text keys, it's true that numbers will index faster and lighter. In most cases, it's not a big deal. One issue with serial numbers (and the reason I avoid them) is they have to be reset every time you do any importing / cloning / etc. It's a pain and forgetting to do it will result in duplicate keys. Also, if you're using any kind of syncing strategy, then serial numbers simply won't work. So I just use UUIDs and move on, since any solution has the potential to need a mobile version requiring sync.

                 

                FWIW

                 

                Mike

                • 5. Re: Best Practice: Code vs. ID?
                  rmittelman

                  Thanks Mike,

                  I tried the auto-enter calculation and it works very well.  The only thing is, when I come back to the field, it still has all the formatting there.  So it's only working as expected when you enter it the first time OR select the entire text and replace it with a number.  The way I described turns it back into a raw number each time you visit the field.  Hard to say which one's better.

                   

                  I like your idea about using a UUID.  It's unclear whether that is more or less efficient than a number field when retrieving many records.  I read this article http://filemakerstandards.org/display/bp/Key+values which discusses using UUIDs, but it's still unclear.  Is a UUID guaranteed to give a unique ID when merging new data entered on a mobile device into your main data store on the server?  I'd be worried about duplicate keys being created.

                  • 6. Re: Best Practice: Code vs. ID?
                    keywords

                    There is a close to zero possibility of Get ( UUID ) producing duplicate values. As Mike has pointed out, this approach gets around the issues with serial numbering when updating files, importing records, etc. where there is a fairly high probability of duplicates arising unless you leave no stone unturned to avoid this.

                    • 7. Re: Best Practice: Code vs. ID?
                      Mike_Mitchell

                      I've never had a problem using UUIDs on a mobile device. That's the primary reason to use them.

                       

                      There is a very small statistical chance of a duplicate, but it's so rare as not to be worth worrying about. You're virtually guaranteed to get a duplicate with serial numbers in that scenario.

                       

                      The operational advantages of UUIDs far outweigh any performance benefit from the differences in indexing. At least IMHO.

                      • 8. Re: Best Practice: Code vs. ID?
                        rmittelman

                        Thanks to everybody who answered this thread.  It's been a big help!

                        • 9. Re: Best Practice: Code vs. ID?
                          DavidJondreau

                          FM doesn't enforce referential integrity like Access does. I would recommend not permitting users to change primary keys. If you need them to, you'll be doing a lot of scripting.

                          • 10. Re: Best Practice: Code vs. ID?
                            srzuch

                            Also, an Access developer converting to Filemaker ...

                             

                            I thought you can use the Filemaker security model to enforce referential integrity  --  preventing deleting parent records with orphans.

                             

                            Filemaker supports cascading deletes, but not cascading changes.

                             

                            Steve

                            • 11. Re: Best Practice: Code vs. ID?
                              Mike_Mitchell

                              Not the security model, but the relationships model. It’s enforced via the Relationships Graph.

                              • 12. Re: Best Practice: Code vs. ID?
                                srzuch

                                There is no option to enforce referential integrity via the relationship graph, unless you are referring to the option to delete all child records if the parent record is deleted.  In Microsoft Access speak, referential integrity refers to preventing the deletion of the parent record if there exists any child records.  That is what I was referring to.

                                • 13. Re: Best Practice: Code vs. ID?
                                  srzuch

                                  Yes it does.  In the security model, you can prevent deletion of a parent record if any child records exist, by using custom record privileges and limiting the ability to delete the parent record if any child record exists.

                                  • 14. Re: Best Practice: Code vs. ID?
                                    Mike_Mitchell

                                    srzuch wrote:

                                     

                                    There is no option to enforce referential integrity via the relationship graph, unless you are referring to the option to delete all child records if the parent record is deleted.  In Microsoft Access speak, referential integrity refers to preventing the deletion of the parent record if there exists any child records.  That is what I was referring to.

                                     

                                    "If you would discourse with me, you must first define your terms."  - Voltaire   

                                     

                                    It seems "Microsoft-speak" isn't the same thing as the industry standard. From What Is Referential Integrity? - Definition & Examples:

                                     

                                    Referential integrity enforces the following three rules:

                                    1. We may not add a record to the Employees table unless the ManagedBy attribute points to a valid record in the Managers table.
                                    2. If the primary key for a record in the Managers table changes, all corresponding records in the Employees table must be modified using a cascading update.
                                    3. If a record in the Managers table is deleted, all corresponding records in the Employees table must be deleted using a cascading delete.

                                    Referential integrity - Wikipedia, the free encyclopedia::

                                     

                                    For referential integrity to hold in a relational database, any field in a table that is declared a foreign key can contain either a null value, or only values from a parent table's primary key or a candidate key.[2] In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table.


                                    What is Referential Integrity? Webopedia


                                    For example, suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly. This is called cascading update.



                                    Now, some of these articles mention prohibiting the parent from being deleted if there are children, so that's one method to enforce it. But to say there's "no option" to enforce it via the Relationships Graph isn't true.


                                    (Worth noting that the Wikipedia entry differs from the other references in that it allows a null value in a foreign key. To me, that's a bit off, because you still have an orphaned record. But okay.)

                                     

                                    Personally, I prefer the cascading delete to preventing the parent from being deleted. What if the parent needs to be deleted? Then you're into bypassing the referential integrity you've enforced. Preventing a parent from being deleted might conflict with business rules. But, to each his own.

                                    1 2 Previous Next