6 Replies Latest reply on Jul 6, 2012 6:11 AM by thirdsun

    Fast list view & sorting vs. redundant data - what's your approach?




      I'm interested in learning how others approach the problem of working with large sets of related records in a WAN hosted solution. It should be noted that this particular DB is still on FM11 - we also work with FM12 in another solutions, but this post is not about the infamous scrolling issues some of you may have experienced with v12.


      As an example I'd like to use a table in our solution that tracks stock movements / transactions by collecting sales, in- & outgoing shipments and so on. You can imagine that there are a lot of records - those are presented in a list view that mostly shows found sets as specified by the user. Users often use search criteria that is hard to predict or pre-define as it would need to cover a wide range of search filters and sort orders. Therefor I prefer a dynamic approach to this and let the user decide how filter and sort the records.


      In the past, I stored data that would be relevant to searching, sorting, grouping and aggregating operations within the Transaction table via lookup values. This included data like product names, manufacturer, order options etc. - of course I know that this isn't the way to model a database since that kind of data should be left in their respective tables in order to reduce redundancy and maximize data integrity. I'd like to get rid of this approach but what is the alternative? Displaying related data like manufacturers, product names or store names causes the list be almost impossible to scroll. At this point I should probably mention that all my data is indexed. Furthermore I rarely use calculations at all (covered by updating fields via ScriptTrigger), but if I have to they are stored/indexed calculations. I prefer the idea of integer values and summary fields, so that I'm able to get totals from the same table based on the relationship and TO I'm using to query - this is very flexible.


      The related data is indexed too and as you might imagine connected via foreign keys like id_product, id_store, id_lineitem and so on - no fancy stuff here. Yet scrolling is so slow once simple things like the product name from the Product Table is shown in the list. The same applies to sorting by related fields. There's no point in storing that data twice - well, apart from the performance boost. This however would ignore the purpose and idea behind relational databases and normalization.


      I should add that every operation that just involves fields from the the Transcation table is fast as expected. We're talking about 200,000 and more records that are almost instantly sorted and aggregated. In fact even aggregating those movements for various sizes (in fact almost 20, the products are mostly shoes) from the products table works as expected. A product's stock for every store and every size is returned in about 1-2 seconds over WAN (about 10 MBit/s) by aggregating every in- & outgoing movement since the last inventory count (once a year).


      Well, back to the performance problems - I'd really like to hear how you handle those situations we're large lists with related data are sorted, searched and aggregated - I can't imagine that I have to choose between a clean db structure and satisfying performance. What is your solution to get both?


      Thanks for your time and input.

        • 1. Re: Fast list view & sorting vs. redundant data - what's your approach?

          Webviewer, filemaker is unable of dealing with long list, especially in 12 that's the terrible truth

          • 2. Re: Fast list view & sorting vs. redundant data - what's your approach?

            When I was doing my databases course at Uni, it was repeatedly enforced that performance trumps structure. So I think that if everything is done in FileMaker, your current approach is the best solution.


            If you want better performance while maintaining the clean data structure, you're probably best off having all the processing done on the server and the results pushed to the client - such as instant or custom web publishing. IWP is quite possibly sufficient for your needs, but if not then use the PHP Site Assistant to generate a sample report and see if that proves fast enough; you can customise it for the right data once you're sure it will do the job.


            If you absolutely have to do it all in FileMaker, I would investigate a solution where the client stores data locally and synchronises changes with the centrally hosted tables as required - 360Works have been advertising their MirrorSync product recently, which works without plugins and is apparently easy to integrate into an existing solution.

            • 3. Re: Fast list view & sorting vs. redundant data - what's your approach?

              thirdsun -


              What you're seeing is a result of the "record-centric" design of FileMaker. Whenever a client touches a record, it receives the entire record - ALL the fields, whether they're displayed on the layout or not. Hence, the best thing you can do in the situation you're describing to speed performance would be to minimize the number of fields in the related table. This can be accomplished in a couple of ways.


              1) You can eliminate fields that aren't really needed. These might be calculation fields used for display, summary fields that aren't really needed except for reporting, or "legacy" fields brought up from an older version. If you have more than, say, 30 or so fields in a table, then there's probably something broken in the data model. (I inherit solutions from people all the time with 500+ fields in a table. They're really ugly.)


              2) You can use a one-to-one relationship and split the related table so that the frequently-used fields live in one half, and the not-so-frequently used fields in another half. This has the additional advantage of utilizing client caching. Whenever a record is updated by one client, it is downloaded (during idle times) to other clients who touch it. This caching behavior helps with response time. However, there's only so much cache space, so if records are being updated frequently, then lots of traffic ends up going over the wire updating the cache - which negates its benefit. If you split the table out so that frequently-updated fields are in one table, and not-so-frequently updated fields are in another, then only the fields that are changed end up loading to the cache. This will speed things up.


              From what you're describing - that the Transactions table loads fast, but the related list view doesn't - I strongly suspect that the table from which the related fields are being drawn has some mondo calculations or just a lot of fields. Just remember the record-centric behavior and it'll help you remodel the database so you don't have to trade a proper data model (per se) for performance. You just need to keep in mind to minimize the amount of traffic going over the wire in a WAN situation, where bandwidth is limited.


              Note: You mentioned aggregating. If you're using aggregate functions, that will REALLY slow things down, because the entire found set must be downloaded to process the aggregate calculation. Consider another method to get your aggregate calculation - perhaps a script or a scratch table that puts just the field(s) being aggregated in one place to minimize the damage.





              • 4. Re: Fast list view & sorting vs. redundant data - what's your approach?

                Thanks for your input.


                I already tried to get rid of everything that isn't necessary. Calculations are very rare in my solutions - If something needs to be calculated it is done via Script Trigger. When it comes to the number of fields though, there isn't much left to reduce - we have quite a special case here as the majority of products we're dealing with in our solution are shoes - and those shoes have a size variation set that is much larger than your average shirt sizes (XS-XXL) for instance.. I considered keeping all product data related to individual sizes in a separate table as you would do with other products like the mentioned example of shirts/clothing. However whenever we show product information - like the current stock, orders, sales and so on - we always need that data for the individual sizes which means there is no reason to put the stock, orders, sales for those sizes in another table as it would have to be fetched anyway. And as you know there are quite a lot of shoe sizes.


                Furthermore we need sums of this data and we'd rather have them in a flexible way which makes aggregated/summary-fields an ideal solution since this enables us to filter the same set of records from different perspectives just by setting up the relations. As mentioned before I need that data and I need it to be shown which renders a separated table useless.


                So those fields for the individual sizes and their respective summary fields already put the field count above 30. The rest usually are keys, dates & timestamps and flag fields (usually in boolean format) - it is kept very simple, there aren't any fancy calculations. With the usual keys and boolean option fields in combination with a constant 1 and 0 value in every table we are able to filter and look at sums in any way you can imagine. It's very flexible, simple and fast - it's just the list view that doesn't seem to handle relations very well. I have to admit that I think this very disappointing in a modern RDBMS.


                If I had more experience in web development I'd prefer Rails application any day, however I really like rapid workflow of FileMaker.

                • 5. Re: Fast list view & sorting vs. redundant data - what's your approach?

                  Well, if those are your requirements, then you're about as good as you're going to get by echoing the data. FileMaker's model will load all fields in the related record, even if it doesn't need them, as soon as you request a display of any related field. And it will do it as soon as it needs to when that record comes into view in a list view. Aggregate functions will only make it worse - way worse - because the entire found set has to be loaded in order to perform the calculations.


                  What you're seeing is a result of bandwidth restrictions on the WAN, as well as a desire to have maximum flexibility and everything calculated "on the fly". Those two simply have a performance penalty. Upgrading to 12 will probably help, since a lot of calculation overhead was moved server-side and bandwidth demand was greatly reduced, but you may be looking at redesigning your layouts to see the full benefit. I haven't seen the much-maligned list view slowdowns, but others have, mostly with converted solutions. I haven't seen too many reports of list views built directly in 12 causing problems.





                  • 6. Re: Fast list view & sorting vs. redundant data - what's your approach?

                    I should add that aggregated fields in list views are only displayed if they belong to the base table. Related aggregates are way too demanding in bandwith and of course I'm aware of that.


                    Apart from that your answer mirrors my experience - I just felt as if I was missing something since this behaviour doesn't really encourage clean relational database design. Thanks for your time and efforts.