1 2 Previous Next 17 Replies Latest reply on Apr 9, 2015 12:56 PM by DavidJondreau

    How big is your database?

    tmslayton

      Just wondering what your database performance has been as your data grows? My database has about 15GB of data and 340GB documents. If I bring that data into FM Server can it handle it okay? The spec says that the server can handle unto 8TB's of data so I am thinking that it could in theory but what about in reality?

        • 1. Re: How big is your database?
          siplus

          To start with, I would recommend to investigate all ideas that help you reduce the size of your document database.

           

          Just an example, on one of the XX databases in our solution: we moved from a 10 GB database to a 46 MB database by only storing the path to files and showing them via WebViewers. (Unfortunately you can't have a WebViewer in a portal. Yet. )

           

          But anything that FM can handle, FM Server will handle better and faster and with automatic backups and with scheduled scripts and and and.

          • 2. Re: How big is your database?
            Mike_Mitchell

            This is two questions:

             

            1) Can the server handle 15GB of data w/340GB of documents?

             

            2) Does such a database provide acceptable performance?

             

            The answer to (1) is yes, absolutely.

             

            The answer to (2) is very dependent on how the database is designed. That's up to you. Here are some considerations:

             

            1) Store data. Avoid unstored calculations, summary fields, aggregate calculations, etc. Substitute triggered scripts or batch jobs that update indexable fields for this functionality.

             

            2) Use "narrow" tables (few fields). Strip out any fields in tables where they aren't needed. This is because when the client requests a record from the server, the entire record downloads, even if only a single field is shown on the layout. More fields, more data, longer download time.

             

            3) Avoid sorting portals. When you do that, the entire related set has to be downloaded. Ditto for List views, where you can.

             

            4) In many cases, ExecuteSQL will be slower than native FileMaker functionality. Use it where it's helpful, but don't assume it's a performance booster. This is especially true if you have an open record on the client when the ExecuteSQL function execute (props to Wim DeCorte for this revelation).

             

            5) Leverage Perform Script on Server to update records in batches or avoid downloading large numbers of records.

             

            6) Use external container storage for your documents. (You're probably already doing this.) This allows the server to leverage streaming technologies.

             

            7) Use the FileMaker theme structure for your layouts. Save all style changes to a style, and then back to the theme, to minimize network traffic for styling updates.

             

            There are other considerations that I can't recall at the moment, and the specifics of your solution will create other opportunities, but that should get you started.

             

            HTH

             

            Mike

            • 3. Re: How big is your database?

              Then current tendency is to cram everything into one file automatically.

               

              Anyone who was a developer before 7 is familiar with using many files, I had 110 in one solution.

               

              Your database may function better if broken into separate parts since you can design user interfaces to work with each.

               

              For instance, I have a database that imports url bookmarks and I store the first instance so I can delete any duplicates that appear (should I reimport an older file). This file grows and grows and since I am importing Craigslist, a lot of the urls become dead wood.

               

              So this is a standalone file. When I finish an import, update and etc. I import the useable URLs into the main database or just link to them. This keeps the main database as slim as possible. It reduces the number of records added and deleted to the main database. It reduces the number of layouts and scripts since those are in this database which reduces developer confusion.

               

              Modular work files are great tools since you can do all the dirty work in them and then export static data to your main file which will improve its speed and efficiency.

              • 4. Re: How big is your database?
                Markus Schneider

                THanks - this should be one of the 'sticky' threads!

                • 5. Re: How big is your database?
                  alecgregory

                  This advice is generally sound, but I would like to add a few observations and go in to more detail on some points, as I think some of the performance optimizations regularly mentioned can be misinterpreted.

                   

                  1) Store data. Avoid unstored calculations, summary fields, aggregate calculations, etc. Substitute triggered scripts or batch jobs that update indexable fields for this functionality.

                  I think this is too general. We have unstored calcs for a good reason, the key is not to use them badly and to remember that, whether a calculation is stored, unstored or just a regular field updated via a batch process, the pain of evaluating it has to be borne somewhere. Sometimes the best place for that pain is on the client machine when the user requests the unstored calc field. Unstored calcs are not evaluated unless they are specifically called in another calc or on a layout, so they prevent a table becoming unnecessarily bloated. Sure, you can use triggered scripts and batch jobs, but they are tedious to maintain and can fail due to record locking issues. Not that you shouldn't use these methods at all, but I think sometimes the work required to use them successfully and maintain them is underestimated.

                   

                  2) Use "narrow" tables (few fields). Strip out any fields in tables where they aren't needed. This is because when the client requests a record from the server, the entire record downloads, even if only a single field is shown on the layout. More fields, more data, longer download time.

                  A properly designed and normalized schema can easily lead to wide tables, and that's often not a bad thing, especially if the volume of data in each field is small. As soon as you start splitting tables up for performance reasons rather than traditional database design reasons a lot of things become more complex, such as maintainability (where do these new fields go?) and exporting (so I need to go to three tables just to do my basic product export, why?). Sometimes it is worth it - working with list views over a WAN, for example - but a lot of the time it isn't.

                   

                  4) In many cases, ExecuteSQL will be slower than native FileMaker functionality. Use it where it's helpful, but don't assume it's a performance booster. This is especially true if you have an open record on the client when the ExecuteSQL function execute (props to Wim DeCorte for this revelation).

                  But let's not forget where ExecuteSQL is a performance booster...

                  1. Counting records with COUNT ( * )
                  2. Getting a single field or small groups of fields from a wide table (it appears that FileMaker doesn't download whole records in order to return single field results

                  And also it can reduce the number of TOs needed in your solution by running some summary functions out of context. If you have a dashboard or report with many aggregate values, and SQL function with a GROUP BY clause can be very efficient.

                   

                  I guess what I'm trying to say overall is that performance optimization involves making hard choices and then living with those throughout the life of the system, rather than just going through a checklist of dos and don'ts.

                  • 6. Re: How big is your database?
                    Mike_Mitchell

                    First, it's a general response to a general question. So complaining it's "too general" seems rather pedantic.  

                     

                    Now, to the specifics of your reply:

                     

                    Unstored calculations evaluate at the client, and only when demanded. So, in a way, they can help you (since they don't impose a network penalty - nothing to download). Unfortunately, the downsides usually outweigh the upsides.

                     

                    You are correct in stating that all calculations impose a penalty; it's just a matter of when. For example, Conditional Formatting imposes a penalty when it renders (specifically, whenever refreshed). However, a stored calculation will only update when one of the feeders changes. Otherwise, it doesn't impose a penalty. Hence they are (again, in general) more efficient than unstored calculations (because the conditions that cause them to update occur less frequently). It's especially true since many unstored calculations are unstored because they reference related data and therefore have to be evaluated at runtime, cannot be indexed, and lead to people searching on them (which is a performance killer).

                     

                    "Not that you shouldn't use these methods at all, but I think sometimes the work required to use them successfully and maintain them is underestimated."

                     

                    So ... your argument is as a developer, you should make your own life easier at the expense of the user?

                     

                    "A properly designed and normalized schema can easily lead to wide tables, and that's often not a bad thing, especially if the volume of data in each field is small. "

                     

                    Again, this can be true, but it's the exception. General design practice (including virtually all performance sessions at the last several DevCons) is to stick with narrow tables - or even to split out infrequently updated fields into separate tables to minimize cache refreshes on fields that don't need it. I can remember this as a best practice as far back as 2008.

                     

                    "But let's not forget where ExecuteSQL is a performance booster...

                    1. Counting records with COUNT ( * )
                    2. Getting a single field or small groups of fields from a wide table (it appears that FileMaker doesn't download whole records in order to return single field results"

                     

                    The first one CAN be true in some cases. The second is conditionally true. As I mentioned before, whether you have a record open on the client or not makes a HUGE difference. If you don't, then the records don't have to be fetched because the query is executed on the server and only the result is returned. If you do, then all the records involved in the query have to be fetched (because they have to be compared with the local values). The results are cached, so subsequent queries will be faster.

                     

                    I never said ExecuteSQL was "always" less efficient. I said you shouldn't assume it was a performance booster, and that it was slower in many cases. And it is, because FileMaker doesn't use SQL as its native query language. SQL queries have to be translated into native FQL before they can be executed - which puts SQL at a disadvantage. Additionally, the open / closed record issue will not affect native FileMaker functionality as much, because in all likelihood, you've already cached most of the records in the calculation. You've paid that price previously.

                     

                    These are principles to mind. They aren't guarantees. Yes, you have to make design decisions. Yes, there are tradeoffs. Doesn't mean there's no such thing as "best practices". Exceptions don't disprove rules.

                     

                    Cheers.

                    • 7. Re: How big is your database?
                      BobGossom

                      Mike,

                       

                      Lots of good responses. +1 on narrow tables, specifically to drive value lists.

                       

                      A few things I might add:

                       

                      -Large files are more vulnerable to corruption and disk errors. It's rare, but when a server malfunctions and a file is damaged, it's usually the largest one that takes the hit. It's worth spending some time on your structure to see about isolating your largest data tables into their own file. EXAMPLE: We have an accounting product where we've done this. One client had their largest file - the General Ledger - go down on the server. They didn't even know it for two days, when they tried to run a GL report. It took us another day of investigation and recovery to implement a clone. The rest of the system was up during all of that time, and no user had significant downtime.

                       

                      -Unstored calcs can help significantly with file size, as well as turning off indexing where it isn't needed. As the record count goes up, the index of a single field can take hundreds of megabytes; even gigs. As developers we often go into the background and perform finds on fields that aren't relevant to users, forcing an index that isn't needed. We often create "audit" calcs to test integrity of the system (finding dups, etc.) These generally need to be stored and indexed while we work with them, but no user touches these fields, so when we're done, we turn them back to "unstored" and eventually recover the disk space. If we need them again, they are already in the system, but since they are unstored, they take virtually no resources…until needed the next time. While accessing large data sets with unstored calcs can bring your system to a screaming halt, there are often cases where the interface naturally interacts with only small record counts, or reports that are run very, very infrequently. In these cases unstored calcs can appropriately reduce your file size.

                       

                      -Off-load the reporting. We often develop summary tables (weekly/monthly/client/product) that are used to generate reports that don't need granular data. Regular business rules or server side scripts can keep the summary tables in sync. This not only speeds the reporting up exponentially, but can reduce the size of the data table/file.

                       

                      Bob Gossom

                      1 of 1 people found this helpful
                      • 8. Re: How big is your database?

                        And one general thought:

                         

                        Rules made by others and thought to be the holy grail of database design can be ignored if they don't work for you.

                         

                        Case in point: on one list a newbie asked for help in designing a report with photographs as this was assigned to him by an employer who gave him a new box of FileMaker and told him to create a file and go photograph his property. Sadly he asked this question on a list with people who were eager to provide a post but not really able to act as a consultant and provide a good answer. The replies ranged from SQL, web browsers, Java, etc. Six weeks later the newbie was lost, had not created the report, etc.

                         

                        Eventually I responded that the report could have been created in five minutes using one table and the employer would have been quite happy. The report could be made hierarchical responding to floor, apartment, room, fixture, etc. with multiple photographs of each. One table. If you understand FileMaker and leverage that knowledge.

                         

                        That table would have many many fields and each record would contain redundante data in order to make the report breaks but it would work, it would answer the problem and it would get the employee a raise for being so productive. Instead, the list answers ended up making him look incompetent.

                         

                        There are different needs and thus arbitrary rules don't always work.

                         

                        Maximizing the speed of a megalithic corporate database is one thing while maximizing the speed of getting the job done for a small firm is quite another.

                        • 9. Re: How big is your database?
                          erolst

                          Isn't it getting just a wee bit tiresome telling the same story time and again?

                          • 10. Re: How big is your database?
                            Mike_Mitchell

                            Hey, Bob.

                             

                            Your responses are appreciated. Of course, we're drifting slightly off the original topic (performance) and into areas of database size and stability. These are very important (maybe even moreso than performance).

                             

                            On the unstored calc issue: Why is it whenever you point out the performance weaknesses of unstored calcs, the defenders of unstored calcs come out of the woodwork?  

                             

                            Yes, of course, you can reduce file size by removing indexing (which includes unstored calculations). The point being, within the context of performance (which means: user interaction), they should be minimized or avoided.

                             

                            And I agree: Offloading reporting / summary tables to a separate file is a good idea.

                             

                            Thanks again.

                            • 11. Re: How big is your database?
                              Mike_Mitchell

                              I don't believe best practices are "arbitrary rules". They're the results of years of experimentation and real-world experience on the part of the community.

                               

                              A big part of my day job is inheriting databases built by "citizen developers" - people with little or no training in database theory, programming, etc. They do what you suggest: Slap something together that "just works". Inevitably, the result is a mess: A flat file database with 500+ fields, calculations all over the place, multiple container fields (internally stored) that performs very poorly, is unstable, and an unholy mess to maintain.

                               

                              If you're building something just for your own use, or for a short-term throwaway purpose, then fine, ignore the wisdom of hundreds or thousands of experienced people. But I'm not about to take that risk with something that's going into production for business use.

                              • 12. Re: How big is your database?
                                siplus

                                I agree with Mike: if somebody's flexibility extends to ignoring basic principles, then it means he's got no principles at all, and that's not good. In other words, even being capable of thinking about it is imho wrong and counterproductive. The windows you leave open will slam at the first wind, hitting you with more work to do and hitting your reputation too.

                                • 13. Re: How big is your database?
                                  DanielShanahan

                                  This is a good list, Mike.  Thanks.

                                  • 14. Re: How big is your database?
                                    alecgregory

                                    Thanks for the reply Mike. The point I was trying to make was subtler that the tone of my post. Let me try again!

                                    You are correct in stating that all calculations impose a penalty; it's just a matter of when. For example, Conditional Formatting imposes a penalty when it renders (specifically, whenever refreshed). However, a stored calculation will only update when one of the feeders changes. Otherwise, it doesn't impose a penalty. Hence they are (again, in general) more efficient than unstored calculations (because the conditions that cause them to update occur less frequently). It's especially true since many unstored calculations are unstored because they reference related data and therefore have to be evaluated at runtime, cannot be indexed, and lead to people searching on them (which is a performance killer).

                                    I am basically saying that this argument is overly simplistic. It's no real help to say that "in general" unstored calcs are worse for performance than stored calcs (or even basic text or number fields updated in bulk). If you are talking in terms of advice you'd give to less experienced developers, then perhaps you could say this, but I'd still argue that's its better to teach newer developers to interrogate their choices rather than generally avoid something as crucial as unstored calcs.

                                     

                                    I think unstored calcs play an important role that is often underestimated when performance optimization is discussed. To put this in the context of your paragraph above, it's not necessarily true that the conditions triggering updates to stored calculations occur less frequently than for unstored calcs. Take, for example, a calculation that references 20 local fields, some sort of automated description (perhaps in HTML) of an entity that includes a title, subtitle, some dates, booleans and some fairly hefty descriptive text fields (a person's biography for example, or a feature article for a magazine). Every time one of those 20 fields is changed the stored calculation will be recalculated. That's a fair amount of  processing. Additionally if the calculation is stored, file size will be increased by the amount of data in those 20 local fields, multiplied by the number of records in the table, which for large record sets can easily amount to hundreds of MB.

                                     

                                    So should this calc be stored or unstored? The answer entirely depends on how often you will access that field. If it's on a layout that is accessed around once a month on average, then almost certainly not. If it's on a layout that's accessed regularly by users, and if they want to search in it, then yes, it should probably be stored. I would say both scenarios are equally plausible and the answer is different for each.

                                     

                                    The way I approach unstored calcs (and this is probably for another thread) is that they work best when considered as code that would usually be in the "Controller" of an MVC application, essentially working as functions that return data in a format useful to the "View" of the data. In the FileMaker world we don't really have the ability to use controllers in the traditional way, so we have to use the FileMaker schema. This is a bit troubling, but less so if we know we're not storing actual data, just references.

                                     

                                    "Not that you shouldn't use these methods at all, but I think sometimes the work required to use them successfully and maintain them is underestimated."

                                     

                                    So ... your argument is as a developer, you should make your own life easier at the expense of the user?

                                    My point is that you have to be sure you can maintain something before you commit to it. Otherwise the user will suffer from the developer biting off more than they can chew. It may be that the solution ends up hurting the system more than the original problem is was intended to solve. My worry is that raw performance-based approaches are advocated for in a way that downplays the complexity required to maintain them and that can be dangerous.

                                     

                                    "A properly designed and normalized schema can easily lead to wide tables, and that's often not a bad thing, especially if the volume of data in each field is small. "

                                     

                                    Again, this can be true, but it's the exception. General design practice (including virtually all performance sessions at the last several DevCons) is to stick with narrow tables - or even to split out infrequently updated fields into separate tables to minimize cache refreshes on fields that don't need it. I can remember this as a best practice as far back as 2008.

                                    Yes, they love suggesting this at DevCon and in technical papers. The performance testing I've done on this suggests that it's a double edged sword at best. For example, it causes performance issues with finds, because a find that once took place in a single table is split across multiple tables. Of course I may very well be in the minority here and I don't disagree with thin tables in principle, I've just never found them to help in any scenarios I've encountered.

                                     

                                    "But let's not forget where ExecuteSQL is a performance booster...

                                    1. Counting records with COUNT ( * )
                                    2. Getting a single field or small groups of fields from a wide table (it appears that FileMaker doesn't download whole records in order to return single field results"

                                     

                                    The first one CAN be true in some cases. The second is conditionally true. As I mentioned before, whether you have a record open on the client or not makes a HUGE difference. If you don't, then the records don't have to be fetched because the query is executed on the server and only the result is returned. If you do, then all the records involved in the query have to be fetched (because they have to be compared with the local values). The results are cached, so subsequent queries will be faster.

                                    Yes you definitely have to handle this, sometimes the only way to properly compare uncommitted and committed data is via an ExecuteSQL query so it's desired behaviour. But yes, if you're doing your big summary field call, you'll be wanting to commit records first.

                                     

                                    I never said ExecuteSQL was "always" less efficient. I said you shouldn't assume it was a performance booster, and that it was slower in many cases. And it is, because FileMaker doesn't use SQL as its native query language. SQL queries have to be translated into native FQL before they can be executed - which puts SQL at a disadvantage. Additionally, the open / closed record issue will not affect native FileMaker functionality as much, because in all likelihood, you've already cached most of the records in the calculation. You've paid that price previously.

                                    You're right, SQL can be slower than native queries and often the ExecuteSQL gain is due to other bottlenecks on the FileMaker side besides the query speed, such as there being script triggers or summary fields on the layouts that the FileMaker find is performed on, or the fact that you need more TOs in your solution to support the FileMaker queries, which leads to bloat.

                                     

                                    These are principles to mind. They aren't guarantees. Yes, you have to make design decisions. Yes, there are tradeoffs. Doesn't mean there's no such thing as "best practices". Exceptions don't disprove rules.

                                    In the FileMaker world there aren't many best practices unfortunately. Even the FileMakerStandards.org site, which has some of the best FileMaker developers working on it, does not list many and I think this is in part because there are too many differing use cases of FileMaker for them to be useful to the majority of developers.

                                     

                                    Cheers,

                                     

                                    Alec

                                    1 2 Previous Next