1 2 Previous Next 29 Replies Latest reply on Jan 25, 2015 9:53 AM by Mike_Mitchell

    Normalized vs. denormalized DB design in FileMaker

    jurijn

      Hello to everybody!

       

      I was thinking about DB design in FM. I went through the Advanced guide of FM and found nothing regarding to the topic mentioned above.
      What is better:
      1. to have denormalized design with less tables, bigger tables and less table connections with facts and dimensions?
      2. to have normalized DB design with smaller tables and more table connections?

       

      I got these questions thinking about performance.
      Generally by other DBs there should be two DB-systems:
      - one system for data-saving; entering (data-integrity and DB size concerns) in normalized form;
      - another sytem for reporting (particularly for performance; summarized data and speed of queries) in denormalized form.

       

      By small systems (DBs) is that probably not important. I am a little confused about all now.
      Should I build by reporting schemas Warehause (starschema) ?

      Concerning data integrity and performance (also regarding best practices), should DB schemas be in a third normalized form?
       

       

      Please give some your experience, opinions or recommendations to above questions.

       

      Thanks in advance!
      Jurij 

        • 1. Re: Normalized vs. denormalized DB design in FileMaker
          mark_scott

          Hi Jurij,

           

          I'll start things off with a couple quick replies — possibly too quick to be of much help (but my time is a bit limited right at this moment). Anyway…

           

          For shear performance, narrow tables are often preferred, given the way FileMaker Server moves data to client machines (entire rows at a time, rather than just "SELECT"ed fields). Beyond normalization, developers will sometimes split a wide table into two (with a 1:1 relationship), putting infrequently accessed fields into the second one (potentially slows sorting on those fields, however).

           

          As for normalization, I learned about normal forms etc. "way back when," from some colleagues who develop in Access/SQL Server, and am generally a believer in "3rd NF or better." That said, FM is no different from other db environments where performance constraints sometimes compel a less-than-fully-normalized design. A good mantra is to "normalize first" (in early schema design stages), then denormalize intentionally and cautiously as needed, remembering that you'll possible be needing to use script triggers or other approaches to keep data in sync. Scripted workflows can often solve multiple problems at once, including keeping tables a bit narrower and relieving some of the issues with denormalized data.

           

          Sorry for the brevity . . . hth,

           

          Mark

          • 2. Re: Normalized vs. denormalized DB design in FileMaker
            Mike_Mitchell

            Mark's recommendations are sound. In general, you should strive for narrow tables, properly normalized. Then, as necessary, break the normalization rules to optimize performance. This normally happens when "proper" normalization creates slowdowns (for example, sorting on related fields).

             

            An example is in order. I had an experience where I properly normalized a join table. Only the key fields were in it. This particular join table was a four-way join (joining four different tables together). A list view was needed for users to select an appropriate record. In order to facilitate finding the correct item quickly, the descriptions were placed on the layout and the layout was sorted sequentially (description from table 1, description from table 2, description from table 3, etc.).

             

            Unfortunately, this created a HUGE performance hit over the WAN, as FileMaker had to load all the records from all the related tables in order to perform the sorting. With the townspeople at the gate with torches and pitchforks, normalization had to give way to copying the necessary description fields into the join table so the sorting would only have to pull from a single table.

             

            As Mark pointed out, Script Triggers had to be put in place to refresh the descriptions when they are updated in the parent table. Price to pay for keeping the performance at an acceptable level.

             

            HTH

             

            Mike

            • 3. Re: Normalized vs. denormalized DB design in FileMaker
              matthew_odell

              Hey Jurij,

               

              Great questions, and I don't think there's a specific prescriptive set of rules that work for every scenario, but I'll try to give you the best information as possible:

               

              First off, this idea that you brought up is a great start:

              Generally by other DBs there should be two DB-systems:

              - one system for data-saving; entering (data-integrity and DB size concerns) in normalized form;

              - another sytem for reporting (particularly for performance; summarized data and speed of queries) in denormalized form.

               

              The reason this is a great start is because there are two major things that negatively effect FileMaker performance. One of them is having wide tables. As Mark mentions, it's always best to have as narrow tables as possible, but FileMaker sends all stored data in a record between Server and Client, even if the user is only viewing one field. This plays really well to your first point, not just for data-integrity, but also for performance.

               

              The other thing that can drastically effect performance is unstored calculations and summaries (when dealing with large datasets). Therefore doing the most you can to store unstored calculations or summary data for reports will also aid in performance. That said, the first rule still applies (try to have narrow tables).

               

              When I teach data modeling, I always start by teaching normalization, but then once the students understand normalization, I talk about when to break the rules for performance. All good data architects do it just for the reasons you mention.

               

              Hopefully that's a helpful start. If you want more information on how to make this happen, let me know.

               

              Matt

              • 4. Re: Normalized vs. denormalized DB design in FileMaker
                mbraendle

                Difficult question. Especially since the meaning of "denormalized" can go very far, to NoSQL databases or models.

                 

                Much depends on the structure of the data you want to describe (or model). FileMaker is a strong tool if the data has a structure that can be modeled into entities and relations(hips), and as others have said, it is in many cases favorable to normalize to 3rd NF or better. However, if the data is highly unstructured, i.e. where entities and/or relations are a priori not well known or definable, attributes are highly variable, data is more graph- or network-like, or items may vary strongly in size and description, other models such as key-value, RDF, XML, neural networks, or information retrieval systems that make use of text statistics are more appropriate. To some extent, it is possible to model such systems in FileMaker. Tables may become very narrow, but the penalty may be that the number of records may be very large (in the 100'000s to millions).

                If you are interested in these topics, see talks and examples (in German) given by Volker Krambrich and me at the 2013 and 2010 German FileMaker conferences about data models, search strategies and information retrieval.

                 

                Examples that go beyond entity-relationship models:

                - a large corpus of unstructured text  (information retrieval)

                - biosequences (DNA, RNA, proteins) (information retrieval, XML)

                - graph-like structures (e.g. people and things and their attributes and relations between them)  (key value, RDF)

                - (a set of) molecules - which are graphs - and their calculated (n-dimensional) properties (XML, RDF, information retrieval, specialized graph descriptions and query languages)

                - evolution of the previous two examples with time

                - mapping semantically similar data (neural networks, learning systems)

                • 5. Re: Normalized vs. denormalized DB design in FileMaker
                  mardikennedy

                  I think it's also very important to evaluate the user environment before applying or ignoring the various theoretical recommendations.  The theoreticals generally apply to massive datasets, accessed simultaneously bu many users, from multiple locations across the globe.  If those are your specific needs, then it's very important to leverage the good advice.

                   

                  If your user environment is different, then you may want to approach things differently.  For example, the theoreticals are taking into account the current technological issues.  Do those 'technologicals' matter for you at this time (eg remote users etc), or can you take a punt that by the time your users need a little more, that the technology will have evolved to deliver nicely?  I certainly don't wish to recommend sloppy design but I think pragmatism can sometimes result in a better experience for the user (both in the ux sense, and also in the maintenance/ support required).

                   

                  Mardi

                  • 6. Re: Normalized vs. denormalized DB design in FileMaker
                    wimdecorte

                    mardikennedy wrote:

                     

                    I think it's also very important to evaluate the user environment before applying or ignoring the various theoretical recommendations.  The theoreticals generally apply to massive datasets, accessed simultaneously bu many users, from multiple locations across the globe.  If those are your specific needs, then it's very important to leverage the good advice.

                     

                    If your user environment is different, then you may want to approach things differently.  For example, the theoreticals are taking into account the current technological issues.  Do those 'technologicals' matter for you at this time (eg remote users etc), or can you take a punt that by the time your users need a little more, that the technology will have evolved to deliver nicely?  I certainly don't wish to recommend sloppy design but I think pragmatism can sometimes result in a better experience for the user (both in the ux sense, and also in the maintenance/ support required).

                     

                    Mardi

                     

                    Keeping in mind though that re-writes (even partial by moving fields in and out of tables) are expensive and hard to sell.  Make sure that all decisions - especially those that may affect performance down the road - are properly documented.

                     

                    The flip side is of course Mardi's point: wasting time and money on premature optimization is just wasteful.

                     

                    It's a fine balance; the issue should not be approached from a strictly theoretical angle, but from a thorough understanding of how FM behaves; especially around performance when showing or fetching data from more than one / two / three / ...  hops away.

                    • 7. Re: Normalized vs. denormalized DB design in FileMaker
                      mardikennedy

                      +1 documentation.  Script comments are great because you can easily summarize the pros and cons.  RG (Relationship Graph) notes are incredibly useful but necessarily brief because the GUI is a bit ik.  Field comments are also good but I tend to use these less.  (Not sure why and I suspect I don't have a good reason.)

                      • 8. Re: Normalized vs. denormalized DB design in FileMaker
                        jurijn

                        Thanks Matthew O'Dell!

                         

                        I understand that by data migration over the web, all fields in a row are sent over to the client. From this point of view it is better to have narrow tables.
                        Exactly in which point of time it is data sent over the wire? If a have a layout with some fields, is the data sent over if / when someone accessed that layout, but only the data (row) that are currently viewed -> so should that function if I understand proprer.

                        But on the other hand, teaches data reporting practice (warehouse, star schema, etc..) about denormalization not to have to many connections within data tables (more joins affect the performance, slows down the speed of queries).

                         

                        I am not so experienced in FM, but I think the best approach is something between for the majority of projects / solutions.
                        That means something normalized to 1st or 2nd normal form. If we go further, we become only more complicated data schema / model.

                         

                        What did you mean exactly with reasons to denormalize?

                        • 9. Re: Normalized vs. denormalized DB design in FileMaker
                          jurijn

                          That is one step further, scientific models
                          I am working in the area of ERP / CRM solutions or consulting.

                           

                          Thanks anyway, maybe someday in the future!

                          • 10. Re: Normalized vs. denormalized DB design in FileMaker
                            jurijn

                            I fully agree with you.


                            I saw many solutions (FMdbs) that did not follow the rules and were not fully normalized.
                            Always we need to work fast and time is limited. Could that be one of the reason that proper normalization takes a lot of additional time?

                             

                            Is there no other issues to small tables than client access of a solution over the wire?

                            • 11. Re: Normalized vs. denormalized DB design in FileMaker
                              Mike_Mitchell

                              As soon as a client requests any field, the entire record is sent. It doesn't matter how many fields are shown on the layout.

                               

                              In the case of Form View, the records are sent in 25-record blocks.

                              In the case of List or Table View, as many records as will fit on the screen are sent, with more being sent as the user scrolls.

                               

                              So the view of the layout matters.

                               

                              Matt has already given you some examples of reasons to denormalize - unstored calculations and summaries will often negatively affect performance. Another is the use of related fields on a layout (especially portals), when used for searching or sorting.

                              • 12. Re: Normalized vs. denormalized DB design in FileMaker
                                Mike_Mitchell

                                I would disagree that proper normalization necessarily takes a lot of additional time. It generally depends.

                                 

                                If you do it properly the first time, there are downsides, in that additional automation is usually necessary for user interface so that users can insert data into the correct place in the data model.

                                 

                                The flip side is, if you do it wrong, then as the solution grows, you wind up having to do significant extra work down the road as needs change and you have to change the schema for "one more additional field" and performance starts to drag.

                                 

                                IMHO, it's generally better to do it right the first time.

                                • 14. Re: Normalized vs. denormalized DB design in FileMaker
                                  richardsrussell

                                  Your question illuminates the distinction between a science (reducible to a set of mechanistic rules and equations) and an art (requiring nuance and human judgment) and helps explain why there will be pretty good job security for database developers until the ultimate triumph of our computer overlords.

                                  1 2 Previous Next