1 2 Previous Next 22 Replies Latest reply on Jan 29, 2016 6:34 AM by BruceHerbach

    ExecuteSQL  DISTINCT?

    BruceHerbach

      I would like to get a set of statistics from a table so that it shows values for each client once.  There are multiple records for many of the clients, but the reporting requirements require results based on both all records and distinct by client.   Here is the initial query:

       

      SQLResult = ExecuteSQL (

      "SELECT COUNT( DISTINCT a.\"ClientID\"), SUM (a.\"ServiceUnitsApproved\" ), SUM( a.\"ServiceUnitsDenied\"), SUM( a.\"ServiceUnitsPending\")

      FROM \"Services\" a

      WHERE a.\"ServiceDate\" >= ? AND a.\"ServiceDate\" <= ? AND a.\"ServiceReceived\" = ?" ;

       

      The results are:

      Count     Approved    Denied     Pending

      150           26                  4          129

       

      The Count is correct.  However the values for Approved, Denied and Pending are based on all records and total 159.

      Is there a way to set the query up so that I can get the correct sums for Approved, Denied and Pending?

       

      In the interest of full disclosure,  I used Seedcodes SQLExplorer to develop the initial query and then tweaked it to get Counts and Sums.

       

      Currently to get the correct results,  I resorted to standard FileMaker scripting, looped through the records and omit the duplicate IDs. 

        • 1. Re: ExecuteSQL  DISTINCT?
          beverly

          What should be the "correct" sum results?

          beverly

          • 2. Re: ExecuteSQL  DISTINCT?
            BillisSaved

            Good morning Bruce,

             

            I hope your day is going well. I just want to make sure I understand the issue you're having, before I contribute my 2 cents. It appears you have 150 unique clients being returned from the Services table, but some of them have multiple records - which would account for the higher total count on the ServiceUnitsxxx status entries. Is that correct?

             

            God bless,

             

            Bill

            • 3. Re: ExecuteSQL  DISTINCT?

              Interesting!

               

              I would try a simpler (separate) FMP solution to just work on the SQL. I use a separate Java IDE so I can really mess around with the data quickly (with SQL statement creation and error assist -- in real time, etc.) outside of the Data Viewer.

               

              Alternatively, could you post your "ERD" or relevant portion from the Relationship's tab and also a subset of the data?

               

              Irrespective of my comment about avoiding the data viewer for generating the initial SQL (via JDBC driver to the FM database), I would recommend using the data viewer to so SUMs on your various fields and see if the counts are correct when you subtract the duplicate records.

               

              With DISTINCT, you should not get any duplicate client ids in this case. Something sounds off.

               

              You can use the "!" in the various fields in Find mode to help find duplicates in one more more fields interactively.

               

              Consider posting portion of your data in CSV so others can better help.

               

              - m

              • 4. Re: ExecuteSQL  DISTINCT?
                Dhrakar

                Hi Bruce,

                  Normally, the DISTINCT clause goes outside of any parenthesis.  That is, you currently have the DISTINCT applying only to your count and not all of the fields.  Perhaps:

                 

                SELECT DISTINCT

                  COUNT(...), SUM(...), SUM(...)

                FROM

                  ...

                will do what you need.  Note, too, that SQL normally requires you to use GROUP BY whenever you do aggregations on a field.  So you should also have:

                  GROUP BY a.\"ServiceUnitsApproved\", a.\"ServiceUnitsDenied\", a.\"ServiceUnitsPending\"

                in your query.

                 

                Regards,

                Derek

                • 5. Re: ExecuteSQL  DISTINCT?
                  beverly

                  Derek, can you cite documentation where GROUP BY is required if you use aggregates (& NO other types of columns) in  a SQL query?

                   

                  SQL GROUP BY Statement

                  SQL Group By

                  GROUP BY clause | SQLCourse - Advanced Online SQL Training

                  SQL: GROUP BY Clause

                   

                  beverly

                  • 6. Re: ExecuteSQL  DISTINCT?
                    Steve Wright

                    Currently to get the correct results,  I resorted to standard FileMaker scripting, looped through the records and omit the duplicate IDs. 

                     

                    Hi Bruce,


                    What's the deciding factor as to which records get's omitted, aside from the ID being a duplicate?


                    The reason I ask is.. (and I might be reading it wrong)  surely this would produce skewed results, whereas the SQL statement you have is providing correct results?

                     

                    From what I understand (ignoring the dates etc for a moment) you have records such as :

                     

                    ClientID | Approved | Denied | Pending

                    --------------------------------------------------------

                    ClientA  | A1 | D0 | P0

                    ClientA  | A1 | D0 | P0

                    ClientA  | A0 | D1 | P0

                    ClientB  | A0 | D0 | P1


                    As evident by the query looking at a single table for its data.  Resulting in (from my sample):

                    2 Clients  |  A2  | D1  |   P1


                    Using the scripted approach, If you  omitted record 1 and 2, you would get

                    2 Clients | A0  | D1  | P1

                     

                    If you omitted record 2 and 3,

                    2 Clients | A1  | D0  | P1


                    I think a piece of the puzzle may be missing.. because to me, the SQL result would be correct with 2, 1, 1.


                    • 7. Re: ExecuteSQL  DISTINCT?
                      electon

                      If you group by a.ClientID the Count of a.ClientID should return distinct values.

                      So no DISTINCT needed and the sums should also be grouped.

                      • 8. Re: ExecuteSQL  DISTINCT?

                        Hi Bev,

                         

                        I agree, I don't think the GROUP BY is necessary here, either.

                         

                        I think the problem is that DISTINCT is only showing counts on DISTINCT ClientIDs. Why there are duplicates of what should be a primary key (unique) is a serious design problem.

                         

                        To wit, how is the SQL Engine supposed to know how to SUM the records for the duplicated ClientID?

                         

                        IOW, which of the duplicated fields should it SUM? If you have two duplicated "ClientID" fields with different values in the other fields, but you do a SELECT DISTINCT on the clientID field, I don't think the SQL Engine can (or should be able to) decide which of the records for the duplicated ClientID field to use in the SUM.

                         

                        I created a sample database with 11 client records, two CLIENTIDs were duplicates.

                         

                        select COUNT(DISTINCT(ClientID)) as cnt,

                        sum(Count1) as sum_count1,

                        sum(count2) as sum_count2,

                        sum(Count3) as sum_count3
                        from counts


                        gave ...

                         

                        I believe the solution is to fix the CLIENTID field to be unique.

                         

                        - m

                        • 9. Re: ExecuteSQL  DISTINCT?
                          electon

                          morkus wrote:

                           

                          I think the problem is that DISTINCT is only showing counts on DISTINCT ClientIDs. Why there are duplicates of what should be a primary key (unique) is a serious design problem.

                           

                          I believe the solution is to fix the CLIENTID field to be unique.

                           

                          Clearly the field ClientID is a foreign key here, so I don't think this is where the problem lies.

                           

                          If someone really tries to avoid a GROUP BY statement then use the INNER JOIN to the client table.

                           

                          SQLResult = ExecuteSQL (

                          "SELECT COUNT (b.\"ID\"), SUM (a.\"ServiceUnitsApproved\" ), SUM( a.\"ServiceUnitsDenied\"), SUM( a.\"ServiceUnitsPending\")

                          FROM \"Services\" a

                          INNER JOIN \"Clients\" b ON \"b.ID\" = \"a.ClientID\"

                          WHERE a.\"ServiceDate\" >= ? AND a.\"ServiceDate\" <= ? AND a.\"ServiceReceived\" = ?" ;

                           

                          This should produce a Count of client id's that have records in the table you're looking at, with relevant sums per Client.

                           

                          And I misspoke trying to type too fast.

                           

                          This will give you only a count of 1 for each customer. So it's no good for the situation at hand. I'd use GROUP BY.

                          • 10. Re: ExecuteSQL  DISTINCT?
                            Dhrakar

                              Funny you should ask that, Bev :-)  After looking at the various sites and documentation, I now realize (remember?) that the GROUP BY is only required if you are mixing aggregate and non-aggregate fields.  For example, a query that returns a city name from each row and a sum of sales for that city.  The query from Bruce has all aggregates, so would not need a GROUP BY.  However, if he had included a client ID or name, it would have had to have group by for those fields.

                              Thanks for the reminder!

                            SQL GROUP BY

                            Grouping Rows with GROUP BY

                            SQL: GROUP BY Clause

                            • 11. Re: ExecuteSQL  DISTINCT?

                              There is only one table listed in the FROM clause. And, each field uses the "a" alias.

                               

                              SQLResult = ExecuteSQL (

                              "SELECT COUNT( DISTINCT a.\"ClientID\"), SUM (a.\"ServiceUnitsApproved\" ), SUM( a.\"ServiceUnitsDenied\"), SUM( a.\"ServiceUnitsPending\")

                              FROM \"Services\" a

                              WHERE a.\"ServiceDate\" >= ? AND a.\"ServiceDate\" <= ? AND a.\"ServiceReceived\" = ?" ;

                               

                              How did you conclude there were two tables?

                               

                              Again, I don't see how GROUP BY can alone can solve the problem of telling the SQL engine which of the duplicated CLIENTIDs to use (for summing other fields values in that record) -- unless there is another discriminator column in the WHERE clause.

                               

                              I would like to see his script that produces the correct result. He clearly must have some rule in there to make sure he gets the duplicated clientID he wants.

                               

                               

                              - m

                              • 12. Re: ExecuteSQL  DISTINCT?
                                electon

                                By method of deduction.

                                 

                                1) The table is called Services. It's usually something that is provided to clients and one client can have many services provided to him. That's why it's totally ok to have duplicates of ClientID.

                                2) If it was not the case and it was already an aggregate table Bruce would not need to Sum stuff there because it would already be so.

                                3) ClientID field naming suggests it's a foreign key. ( not 100% bulletproof but a primary key is usually called ID, and the meaning is derived from the table name ) as opposed to a foreign key that suggests by it's name what it might relate to.

                                • 13. Re: ExecuteSQL  DISTINCT?

                                  See if you can get an actual sample as you're indicating to work.

                                   

                                  FMP does not require the sums to be grouped in the SQL:

                                   

                                  ExecuteSQL ( "select  count(DISTINCT (ClientID)) as clientID,

                                  sum(Count1) as sum_count1,

                                  sum(count2) as sum_count2,

                                  sum(Count3) as sum_count3

                                  from counts"  ; "" ; ¶ )

                                   

                                  - m

                                  • 14. Re: ExecuteSQL  DISTINCT?
                                    beverly

                                    BUMP! again, it's difficult to get the correct query if we don't know the "Correct result" as expected by BruceHerbach

                                    If we had a posting of the script, perhaps we can condense to ExecuteSQL?

                                     

                                    beverly

                                    1 2 Previous Next