8 Replies Latest reply on Nov 5, 2013 10:01 PM by Oliver_Reid

    ExecuteSQL Speed Improvements

    CICT

      We've loved the opportunities that ExecuteSQL has opened up but on many occasions had to abandon use of it due to slow calculation speeds. However, we've been persevering and would welcome comments by people with far more experience than us on the following:

       

      We've carried out some controlled tests, that includes quitting FileMaker prior to running each test, as a second run of a query can often be much much faster than the first. Equally, all fields referenced have been indexed.

       

      Our tests were carried out on a horse insurance document management system that required the consolidation of data from many tables, but to try to keep it more generic, I've substituted the actual tables and fields to an 'Orders' and 'LineItems' generic example.

       

      The 'Orders' table consists of 5k records and 'LineItems' 13.5k records. A script was used to display a dialogue box containing the time it took to run the ExecuteSQL calculation. The SQL statement is built up within a Let statement, with the result being the ExecuteSQL calculation, so here goes:

       

      The following

       

      SELECT l.description, l.qty, l.unit, l.total

      FROM LineItems l

      INNER JOIN Orders o ON l.kIDOrd = o.kID

      WHERE o.kID= ? AND l.group = 'red'

       

      where ? is the Orders::kID parameter within ExecuteSQL and kID and kIDOrd are the local and foreign keys.

       

      We varied the joins, fields and parameters but this consistently took between 10 and 11 minutes to run. Operationally completely unacceptable.

       

      The problem appears to be that the above has to run the inner join across the 5k Order records, before isolating down to the unique order kID hence the time taken. If the number of order records are reduced, then the above calculates quicker. However, we don't know of many databases where the number of records gets smaller!

       

      In the above example we are only pulling records from the single sub-table, so in this case we can simplify the sql as:

       

      SELECT l.description, l.qty, l.unit, l.total

      FROM LineItems l

      WHERE l.kIDOrd = ? AND l.group = 'red'

       

      Across the 5k records this is almost instantaneous, but only works when selecting from 1 table.

       

      All documentation we've used recommends using JOIN, so we haven't tried multiple WHERE statements.

       

      Although we've been looking at nested options, the simplest solution to our speed problems has been to introduce something like the following:

       

      *asterisks only added to make it easier to see where the variable name is used

      Let ( [

      *ordID* = Orders::kID ;

      sql =

      SELECT l.description, l.qty, l.unit, l.total

      FROM LineItems l

      INNER JOIN Orders o ON l.kIDOrd = *ordID*

      WHERE o.kID= ? AND l.group = 'red'

      ;

      ExecuteSQL ( sql ; "" ; "" ; Orders::kID)

      )

       

      which can be further simplified to (but has had no noticeable effect on speed):

      Let ( [

      *ordID* = Orders::kID ;

      sql =

      SELECT l.description, l.qty, l.unit, l.total

      FROM LineItems l

      INNER JOIN Orders o ON l.kIDOrd = *ordID*

      l.group = 'red'

      ;

      ExecuteSQL ( sql ; "" ; "" )

      )

       

      By linking the inner join directly to the Let variable (Orders::ID), the 10/11 minutes has reduced to produce almost instantaneous results, as only the single required record of the 5k records is being assessed. We would have thought the WHERE statement and ExecuteSQL parameter would have done this, but these appear to run after the 5k records have been assessed by the standard INNER JOIN statement. By using the Let variable, this works for each record visited.

       

      We can see additional benefits of this by using 'IN' where a subset of IDs could be reported against and we have the options of combining data from many sub tables.

       

      I fully appreciate that this isn't suitable for wider reporting, but so much of what we do is combining information from various tables for a single record where SQL appears to be the best tool for the job and reduces the TOs required in the organisational chart.

       

      So, we were wondering whether anyone got a more acknowledged way of achieving the above? Can anyone see any pitfalls to the above? Does anyone have any other techniques they use to overcome the apparent slowness of ExecuteSQL in FileMaker?

       

      I look forward to your comments.

       

      Andy

        • 1. Re: ExecuteSQL Speed Improvements
          taylorsharpe

          Ahhh, the complexities of SQL and figuring out what is going on in the background.  I've found that I often have to experiment like you have to optimize things that are going slow.  Note that SQL is not any faster than native FileMaker functions, but you can sometimes optimize SQL to look through less info or other tricks to make things go faster. 

           

          In your case, you might experiment with the "IN" SQL function and Subqueries to reduce the set that outer SQL is calling on.  Obviously any time you avoid a join, things go faster.  Best of luck!

           

          e.g., Sub Query

                   

               SELECT *
               FROM CUSTOMERS
               WHERE ID IN
          (SELECT ID
               FROM CUSTOMERS
               WHERE SALARY
          > 4500) ;

          • 2. Re: ExecuteSQL Speed Improvements
            DavidJondreau

            I'm not 100% confident with SQL JOINs, but I don't see how using a Let() by itself would produce an effect. All the action still happens within the ExecuteSQL parens.

             

            Is that code the actual code? I'm assuming not because you haven't quoted the SQL statement, and your brackets are wrong, etc but just in case, you are separating the variable out from the rest of the statement right?

            The inner join line should be:

             

            "INNER JOIN Orders o ON l.kIDOrd ='" &  ordID & "'"

             

            Also, I've seen the tilde used to clarify variable names in Let() statements.

             

            Let([

            ~ord.ID = Orders::kID ;

            etc

            • 3. Re: ExecuteSQL Speed Improvements
              CICT

              Thanks David

               

              No, as mentioned, the original code was from a current project for an insurance document management system, but it is too bespoke and complicated to try to discuss.

               

              I'm hoping a few contributors will add their ideas as to how to speed ExecuteSQL up. There are plenty of pages out there showing standard structure for SQL statements and JOIN is similar to a TO link within FileMaker, i.e. 2 tables linked by at least 2 key fields and variants of. FileMaker is blindingly quick when taking advantage of this, but requires TO repeats all over the place, depending on the context you're requesting data from.

               

              SQL allows us to keep the TO graph leaner, but often with a trade off in speed. By introducing a single ID into the JOIN statement, we've 1 example as to how to skip a calculation through the complete record count, which means we can use SQL for other non reporting functionality. This can be expanded by using multiple joins referring to single ID values, rather than complete table to table join across all IDs.

               

              Taylor has suggested the IN statement as well. We have as yet to carry out similar controlled tests using this alongside JOINs to establish the order in which the calculations are assessed, but reducing the original data set as early as possible in the statement has to be the goal. As they say, 'there are many ways to skin a cat' and we'd be interested in knowing how people are approaching these issues.

               

              Anyone working with ExecuteSQL running in the data viewer, will know what trouble you can get into, again this idea would allow you to test your statement with a single record, before rolling across the whole data set.

               

              Andy

              • 4. Re: ExecuteSQL Speed Improvements
                jbante

                I agree that ExecuteSQL performance can be bad in certain pathological cases, I've found it to be an overall speed improvement. Though ExecuteSQL may not be faster than every operation we can achieve in other ways with other methods in FileMaker, I find it's almost always much faster if it spares me from having to switch layouts or open a new window, both of which are relatively slow (even if the window is frozen). When comparing ExecuteSQL with pulling data through relationships, the comparison is difficult. Relationships can be slow in pathological cases, too; there are things we can do with ExecuteSQL that we can't do at all with relationships alone; a head-to-head test in a simplified file doesn't accrue the performance consequences of a file that takes a relationship-heavy graph strategy; and for all we know, testing in a file with that relationship-heavy graph strategy may show that ExecuteSQL suffers the same consequences within that file.

                 

                As you pointed out, the particular example query you use could be expressed without any join, and folks should do that if they can. Giving the computer more to do will never make it faster. I haven't tested this, but I have a hypothesis that ExecuteSQL joins work best when the "WHERE" clause only matches on fields from the table first mentioned after the "FROM" keyword. How does your performance compare if you try this query:

                 

                ExecuteSQL (

                    "SELECT

                        l.description

                    ,   l.quantity

                    ,   l.unit

                    ,   l.total

                    FROM Order AS o

                    INNER JOIN LineItem AS l ON

                        o.id = l.id_Order

                    AND l.group = ?

                    WHERE

                        o.id = ?" ;

                    "" ; "" ;

                    "red" ;

                    Order::id

                )

                 

                 

                 

                I've also found once or twice that the fastest-performing solution for exotic joins is running the non-join portions of the query separately, and evaluating the equivalent of the joins by feeding those results into a custom function. I got a 10^3 magnitude performance improvement by doing this in one extreme case with an outer join.

                • 5. Re: ExecuteSQL Speed Improvements
                  TimDietrich

                  Andy --

                   

                  There are a couple of things about ExecuteSQL that sometimes makes it challenging to work with:

                   

                  We don't have visibility into the query execution plan that FileMaker is using. As a result, we have to guess at what it is doing before we can try to optimize it.

                   

                  There are times when FileMaker doesn't appear to be optimizing a query. Instead, it seems to come up with an execution plan and runs with it, without checking to see if an alternative plan might be more efficient.

                   

                  Your example is a good example of this. In your original query, you would think that FileMaker would see that the WHERE clause is going to restrict the result set (and significantly, too), and would therefore resolve that part of the query first. Instead, it looks like it performed the JOIN first, and then applied the WHERE claise to filter the result. (Your approach to optimizing this query is interesting, and I've never used it before. It's nice to know that we can this technique should we need to.)

                   

                  I ran into a similar situation and posted it about it last week. In my case, I was trying to get the maximum value of an indexed field. FileMaker struggled with the query, and the response time was unexpectedly slow. From what I could tell, it was scanning the table to determine the maximum value, instead of just looking at the index. (I was able to optimize this one by getting the distinct values of the field, sorting them, and grabbing the first value.)

                   

                  You wrote, "...reducing the original data set as early as possible in the statement has to be the goal." I think that in general, that's exactly what we need to do. Sometimes that requires crafting our SELECT statements in unusual ways.

                   

                  I hope that as ExecuteSQL continues to grow in popularity and use, we'll see more posts like yours that share what we've found and techniques that we've come up with. We'll all benefit by learning from each other, and getting a better understanding of how FileMaker has implemented ExecuteSQL.

                   

                  -- Tim

                  • 6. Re: ExecuteSQL Speed Improvements
                    CICT

                    Thanks to jbante and Tim for their additional contributions.

                     

                    Jbante, it looks like you may be on to something there. I need to do further tests and check the direct comparison, but an initial test reversing the SQL statement as you suggest has reduced the time from the 10/11 minutes to about 3 to 4 seconds, which is still 3 to 4 times longer than directing the Inner Join to the actual parent table ID, but a lot of food for thought there. Must admit, seems unnatural to be calling the child data from the parent table, but compared to the more acknowledged structure we originally tested, your suggestion is a huge speed improvement.

                     

                    Tim, it would be great if we could have a bit more insight into the FM implementation and, no doubt if you can afford the time and cost to go to DevCon, that information may be available.

                     

                    People such as Kevin Frank, Beverley Voth and John Sindelar, amongst others (apologies to those I haven't mentioned) have produced resources that has been a huge help in adopting ExecuteSQL and as we move so much of what we used to calculate into interface files, it should be our first tool of choice. However, when you read SQL developers from Microsoft and Oracle complaining about a delay in reporting on millions of records, then we grind to a halt with a few thousand it does get frustrating. Running ExecuteSQL on an external web based ODBC source is excruciatingly slow.

                     

                    We almost need a Brian Dunning type central resource for this where people can provide examples of good and poor practice, not just in terms of query structure, but for perfomance. We did pay an SQL developer to check some of our statements in case we were making obvious mistakes, but it seems we need to be a bit more inventive in our statements than he had to be.

                    Thankfully our development schedule is heavily booked at the moment, so trying to find time to work these things out ourselves is equally difficult.

                     

                    Again, thanks for your input.

                     

                    Andy

                    • 7. Re: ExecuteSQL Speed Improvements
                      Oliver_Reid

                      I am sure I am missing something but

                       

                      SELECT l.description, l.qty, l.unit, l.total

                      FROM LineItems l

                      WHERE l.kIDOrd= ? AND l.group = 'red'

                       

                       

                       

                      Would get the same result with no join at all , no?

                      • 8. Re: ExecuteSQL Speed Improvements
                        Oliver_Reid

                        Regarding speed : FM's file structure is a fundametal limitation on the speed at which data can be extracted form mutiple reocrds across mutiple tables. The more stored fields in a table, the less efficientsuch operations become, as a  whole row must be loaded to get at the value in one field. (By contast, Oracle can in general read down a column as fast as it can read along a row.)

                         

                        Execute SQL provides economy of programming complexity in many cases, compared to using conventional  Filemaker methods to get the same result, but I have found any speed gains to be negligible.

                         

                        Btw be careful with "IN". E.g.,

                         

                        SELECT <query1>

                         

                        WHERE <result element of query1> IN  <query2>

                         

                        wwill result in query2 being excecuted over and over for every row output from query1.

                         

                        Better to set a variable take the value of the result of query2 and then use the variable after the IN.