12 Replies Latest reply on Aug 26, 2014 5:18 PM by sccardais

    Counting Unique Values

    sccardais

      Title

      Counting Unique Values

      Post

           I want to count the unique values in a number field "OrgID"

           I am using FMP 13 Advanced.

           I tried the formula below but it's not working. 

           Is there something wrong with the syntax? The table name is "Table" The field name is "OrgID"

      25-Aug-2014_05_36_pm.png

        • 1. Re: Counting Unique Values
          philmodjunk

               Table is a lousy name for a Table in a database! Give it a descriptive name so you know why you have that table when you revisit the design of your database 6 months later!

               The specific issue that it creates here is that Table is a reserved keyword in SQL and thus you are getting the ? result that indicates a syntax error.

               You often have to enclose table and field names inside double quotes before the query will work. Use this query:
               SELECT Count ( DISTINCT OrgID ) From \"Table\"

          Caulkins Consulting, Home of Adventures In FileMaking

          • 2. Re: Counting Unique Values
            sccardais

                 Changing the table name to Users and wrapping in double quotes worked but it's incredibly slow!

                 There are only about 13,300 records and it's taking about 2-3 minutes per record! 

                 Is there anything I can do to interrupt this process?

                 Why would that be?

                 Is there anything I can do to interrupt this process? At this rate, it will take about 22 days for this process to finish!

                  

            • 3. Re: Counting Unique Values
              philmodjunk

                   ExecuteSQL isn't using a native SQL engine like you might see with a SQL query in another database application. Speed over large data sets is not its forte.

                   Plus, was OrgID an indexed field before you tried this query? If not, the first thing that FileMaker will do is create an index for this field and from 13,000+ records, that will take a while. But if the field can be indexed and you don't prevent it, this would be a one time delay the first time you have this field evaluate.

                   There are several alternative approaches to getting a unique count, but first make sure that this field is indexed.

                   You can sum the reciprocal: How to count the number of unique occurences in field.

                   You can import these records into another table and use a unique values validation to filter out the duplicates

                   You can set up a value list based on this field and then use ValueCount ( ValueListItems ( ... to count the unique values.

                   But I can't promise that any of them will be quicker than ExecuteSQL. You'd have to try them and see.

              • 4. Re: Counting Unique Values
                sccardais

                     Phil:

                     This is a mystery. The ExecuteSQL calculation is taking literally over 3 minutes per record. (3 mins * 13,325 records = 666 hours!) I know that cannot be correct. It makes me think there's something wrong with my data.

                     Note: The field, OrgID is a number and it is indexed as you can see from the mashup screenshot below.

                     I also tried another approach that marks the first occurrence of a value in a list sorted by the value. 

                     c_MarkFirstOccurrences = 

                     Let (x =GetNthRecord ( OrgID; Get(RecordNumber) - 1) ;

                     If (OrgID <> x, 1 ; "")

                     ) // end let

                     This didn't work either. After sorting on OrgID, the calculation above did not work consistently. Some (not all) OrgID's that appeared identical to the one above it were flagged with a 1.

                     Any suggestions?

                      

                • 5. Re: Counting Unique Values
                  misham

                       Syntax follows this:

                       ExecuteSQL ( sqlQuery ; fieldSeparator ; rowSeparator { ; arguments... }

                       So maybe this can help. Can explicitly mention new line character as rowSeparator.

                       ValueCount(

                       ExecuteSQL ( "SELECT DISTINCT orgID from Table WHERE orgID IS NOT NULL" ; "" ; "¶")

                       )

                  • 6. Re: Counting Unique Values
                    philmodjunk

                         Since the query returns a single aggregate value, I don't see how adding ¶ as a row separator would have any effect on how fast the query evaluates.

                         With c_MarkFirstOccurrences, was that a stored or unstored calculation? And GetNthRecord can max out system resources for large record sets and then return ? instead of a value. So if all but the first few hundred records are showing up as a "1", that's probably what happened here.

                         You can use a self join relationship to mark first records without using GetNthRecord nor do you even have to sort records to group them by OrgID.

                         Users::orgID = Users 2::OrgID

                         The trick is to have a number field in users loaded with a serial number. For existing records, you can use Replace Field contents as a one time fix and if you set this up as an auto-entered serial number field, new records will automatically be numbered. Then you sort your relationship by this added serial number field.

                         c_MarkFirstOccurrences can then be defined as:

                         NewSerialNumber = Users 2::NewSerialNumber

                         The above expression will be true and evaluate as 1 only for the first instance of each value of OrgID.

                         But I've been thinking about this and would like to suggest a possible alternative that will take some time to get through the initial set up, but then should be very fast from that point forward:

                         Add a new table to your file unless you find that you already have a table matching what I am about to describe. Define the same OrgID number field in this table. Set validation field options on this field to be Unique Values, Validate Always. Now use Import records to import all records from Users into this table. The field validation will omit all duplicates and thus the count of records in this table will be the number of unique values for OrgID in the User's table. That will take some time to do, but once you have done the import, you'll be able to get a fast count on the number of unique values anytime that you need it if, adding a new value to OrgID in Users is set up as the process of first creating a new record in the new table and copying the value of the unique OrgID value of this new record to the OrgID field in Users. Essentially, this makes OrgID a primary key in the new table and turns OrgID in Users into a foreign key field. I'd then link the two tables by OrigID in a relationship as it is very likely that I'd find such a relationship useful.

                    • 7. Re: Counting Unique Values
                      misham

                           Yes Phil adding a new line character won't improve the speed but it will make sure the values are separated properly to count it later. Mainly to avoid any inconsistency in the output.

                            

                      • 8. Re: Counting Unique Values
                        philmodjunk

                             Sorry but there are no "values to count later" as this expression just returns one single value.

                        • 9. Re: Counting Unique Values
                          sccardais

                               Phil:

                               Thanks. Very practical suggestion that worked perfectly. I got as far as the import but have not yet set up the 2nd part of your suggestion but it makes perfect sense.

                               Regarding the speed issue, might it help to increase the file cache size? It's now set at 128 MB. I have plenty of RAM and disk space, too.

                               Thanks again for the help on this.

                               MISHA MAHTO: Thank you for your input and help as well.

                          • 10. Re: Counting Unique Values
                            sccardais

                                 I've given this a little more thought too. Several thousand new users are added to the Users table monthly so adding them to the UniqueOrg table doesn't sound practical.

                                 Instead, I think we'll need to import into Users and automate the import process in UniqueOrgs to replace existing values.

                                 FYI only.

                                  

                            • 11. Re: Counting Unique Values
                              philmodjunk

                                   Those several thousand users need only be added to that table once at the point in time that they first become part of the system and to me, the large number of records makes this table more necessary rather than impractical.

                                   

                                        Instead, I think we'll need to import into Users and automate the import process in UniqueOrgs to replace existing values.

                                   That raises interesting questions about why you have duplicate values in your Users table to begin with... wink But if that works, you have an even simpler data model to work with. (I mainly suggested a new table as I assumed that you needed those duplicate values in the current table for some reason and the separate table let you keep that table unmodified.)

                              • 12. Re: Counting Unique Values
                                sccardais

                                     Users are unique. Orgs are not. Many users at one Org.

                                     I wanted to count unique orgs in the Users table.

                                     Thanks.