4 Replies Latest reply on Jun 19, 2016 2:23 PM by PeterWindle

    Seeking advice on many versus fewer tables?


      I'm in the process of redesigning the solution I've built.  Without getting into too many details, the project has been one of those "make it up as you go along" projects and, as a result, is predictably bloated.  Now that we've more or less figured out what my client really wants and needs, I'd like to simplify the overall design and make everything web friendly.  When we started this project, they weren't interested in the web.  Now they want to access everything via web direct (go figure...), so some optimization is definitely required.  I will say, I'm quite impressed by web direct.  It handles my existing solution quite well.  One click indeed.


      Imagine a task manager with a variety of task types.  Each type has a few type-specific bits of information.  For example, one task is to issue a Purchase Order on a certain date.  When viewing that task, the users needs to see the PO number, dollar value, and whether the items on the PO are being delivered to the job site or the office.  When viewing a Punch Item, the user needs to see the Punch topic and summary and delivery information for items on the Punch, if applicable.


      At the moment these different task types each have their own table.  Each one links to a single Events table that stores the due date, task status, and completion date (we generate calendars from this table using SeedCode Subscribe).  When this project started, it made sense to have a different table for each task type as I thought we might build out additional type-specific features.  But I don't really see that happening.


      If I created a single Tasks table with five text fields and five number fields, I could replace all of the smaller type-specific tables.  Each task type would retain it's unique layout, but the underlying table could be the same.  If task type = "Purchase Order", textfield1 stores the PO number and numberfield1 stores the dollar amount.  If task type = "Punch", textfield1 stores the Topic and textfield2 stores the Summary.  Etc.  Some task types would use most of the fields.  Others might only use one or two.


      There are other common features that could be simplified as well.  For example, Purchase Orders, Punch Items, and Change Orders may all have related Deliverables records displayed in a portal on the respective layout.  Deliverables function the same for each of these task types.  Another common feature is the ability to attach documents and images to Punch Items and Purchase Orders (and, in the future, Submittals, RFIs, etc).  Although there is a single Deliverables table and single Documents table, there are multiple instances on the graph to accommodate for the fact that Purchase Orders are one table and Punch Items are another.  And all scripts related to managing Deliverables and Documents have to take that into account as well.


      The upside to creating a single Tasks table (with one relationship for Deliverables, one for Documents, etc) I see is simplicity.  It would greatly simplify my relationship graph and scripts.  The downside I see is creating one big monster Tasks table and losing the ability to further build out specific task types into their own modules.  That said, whereas I used to think Purchase Orders might one day grow into a full-fledged quoting system, I no longer believe that.  So is it overkill to have all these smaller tables?


      I also don't see combining Events and Tasks.  I still feel that date information should reside in a different table, but instead of linking individual task type tables (ie: Punch, Purchase Order) to Events, there would now be a single Tasks table.  The reason I don't want to further consolidate things by merging Tasks and Events is that we have a lot of non-task dates that we want to see on our Project calendars.  When such a date is added to the project, an Event record is created and linked to the project, but no task record is created.  If Events and Tasks merged into one table, at least 50% of the records would contain absolutely no task information, so I felt that keeping task information in another table made sense.


      Any and all thoughts appreciated.  Thanks!

        • 1. Re: Seeking advice on many versus fewer tables?

          A table should deal with one data entity. I'm a big fan of "skinny" tables.

          1 of 1 people found this helpful
          • 2. Re: Seeking advice on many versus fewer tables?

            I've often thought about the "best practice" for table design. I've seen a lot of solutions that have multiple tables that are almost identical and thought to myself, that's over-engineered.

            I have to be honest, in a lot of my solutions, I've often found myself using one table for many purposes. But as I gain experience, I have begun to see some benefits of separate tables - I have even considered "shadow" tables, which are a one-to-one relationship to existing tables to "extend" the usable fields, without disrupting existing tables... here are the things I consider before I build the multiple tables:


            Data in and data out:

            In a lot of cases, there may be many reasons to import/export data. Having a table that only has the relevant fields to it's purpose makes for simpler import/export setups. You don't have to dig through a thousand fields to figure out which ones you are focused on. Only the fields relevant to that table are visible at the time of import/export (or other data exchanges). If you design it properly, it could also save a lot of mucking about finding the data sets before exporting as well... you have to ask yourself, how long does it take to extract that set of records that contain the data I need, before exporting.



            Having separate tables makes for easier security setup. If there are different security concerns required for different data, it may be simpler to contain that data in a separate table. In the case of the "Shadow" table, it makes it easier to deliver a solution where the main table is "locked down" from user access and the "shadow" table allows users to add/remove fields without affecting the "main" table. Neat. Especially relevant when dealing with multiple file solutions.



            Some developers find it easier to deal with separate tables, some prefer the all in one approach. I know that with most of my solutions, I'm the only one ever dealing with maintenance so I can deal with whatever I've built - but, if your solutions go out to other developers to work on, consider how if affects them. Too may dependancies on data could complicate the programming experience and... as we all know, data is not always perfect.



            Especially important with multi-file solutions... can the tables be added / removed easily, depending on the solution? How does it affect the data...?


            The "nth" factor:

            The conclusion could be made... "why not". Sure, it may appear to be over-engineering however... so what. One day, this might be of some additional benefit...



            if 50% of your records are going to be blank data, I'd definitely consider separate tables. Design for the majority, not the minority (or "half" in your case !)


            My new rule is: "build the table to suit the data, not the data to suit the table."


            These are just my thoughts, I'm sure there are more things to consider... my opinion only. Disclaimer... disclaimer...

            1 of 1 people found this helpful
            • 3. Re: Seeking advice on many versus fewer tables?

              Thanks Rick and Peter for taking the time to offer your thoughts, especially Peter for the in-depth point of view.  I've been hard at work on the project this past week and have pretty much come to the conclusion that a single table for my task manager is appropriate.  On the one hand, I have these different types of tasks that could be separate tables given that they are different entities (ie: a punch item, a delivery, etc).  I made them separate entities in the first build because, like Rick, I've always believed that a table should exist for each data entity.


              However, as I built out the solution, I realized that I was more or less repeating myself, feature-wise, from one entity to another, while the entities themselves were quite similar.  When this project started, I imagined that Purchase Orders would develop into a full-blown module with accounting-specific features.  Same goes for the other modules.  But that hasn't happened and, after further discussions this week with my client, it has become clear that they don't see the solution moving in that direction.  The CFO is never going to cede control of generating Purchase Orders to the Filemaker solution, for example.  She and her team like Quickbooks just the way it is.  What is required is an effective solution for managing all the different types of date-driven tasks within a project, seeing them in a comprehensive view, filtering them, assigning them to users on a project, generating project calendars, etc.  The actual data stored for each task is rather minimal.


              So, I've changed my perspective a bit.  I no longer see these as different data entities (ie: a Purchase Order, a Punch Item), but rather one entity, a Task, of different types.  Thanks again for taking the time to offer your thoughts and advice!

              • 4. Re: Seeking advice on many versus fewer tables?

                The important thing here is that you're taking the time to think about it and plan it.

                Can't say enough about the planning phase. Due to FileMaker's excellent rapid development environment, far too many people build things without planning ahead, which often leads to nightmare maintenance later on down the track... good luck!