12 Replies Latest reply on Apr 29, 2014 6:57 AM by jbrown

    How Can I Find the Median of a Dataset?

    richardsrussell

      FileMaker Pro has built-in functions that let me find record counts, averages, minima, maxima, fractions of total, variance, standard deviation, etc. but nothing to generate a median (which is often a more useful representative statistic than the mean). Does anyone know of a way to calculate the median of a numeric value in a subset of data?

        • 2. Re: How Can I Find the Median of a Dataset?
          jbrown

          Good morning.

          There is no function of finding the median.

          You basically have to run this via a script:

          1. Find a set of records for which you want the median.

          2. Sort them by the field for which you want the median.

          3. Determine which number to use for the median:

          4. Report.

           

          For step 3, you have to see what the found set is:

          IF the found set is an odd number, you simply go to the middle record:

            IE: If there are 7 records, go to record number:  Ceiling ( 7 / 2).  7/2 = 3.5.  Go to the 4th record.

           

          If the found set is an even number, its is more complicated. YOu have to find the middle two values, then find the middle of those two:

           

          There are 6 records in the found set. The field values are:

           

          1

          2

          5

          7

          8

          20

           

          So I'd go to the 3rd record. (GetFoundCount / 2). Get its value (5). Put 5 in a variable.  Go to the next record and get its value (7) and put it in a variable. Then I can find the average of these two variables:  5 + 7 = 12 / 2 = 6.

          In this case the median is 6.

           

          You have to script all of this because the found count could change and you have to do a branching script step if the found set count is even or odd.

           

          I worked this out on a process in my school's database. It works very well. I'm happy to give more details if needed.

          • 3. Re: How Can I Find the Median of a Dataset?
            mikebeargie

            Also, you can use a custom function:

             

            https://www.briandunning.com/cf/298

            Let ([
            count = WordCount ( valueList ) ;
            odd = MiddleValues ( valueList ; Truncate (count/2;0) +1 ; 1 );
            even = MiddleValues ( valueList ; Truncate(count/2;0) ; 1 )
            ];
            Case(
            Mod ( count ; 2 ) ; odd ; (even+ odd) / 2
            )
            )

             

            I believe the above assumed the valueList value you pass is already sorted in ascending order.

            • 4. Re: How Can I Find the Median of a Dataset?
              richardsrussell

              No, this is quite a thoro treatment of the question, for which I thank you, Jeremy.

               

              I was really hoping for a summary function akin to "Average" to do the work for me, but I'll undoubtedly end up using your approach if nothing else turns up. The nuisance, of course, is that I want to find the median not just for a single dataset but for each subgroup of a sorted larger group (in this case, different animal breeds within a species), which would require multiple passes thru a median-calculating script.

              • 5. Re: How Can I Find the Median of a Dataset?
                jbrown

                RIchard,

                I agree. It is a pain, but, as my math students could attest, it has to be handled individually with each found and sorted set.

                I haven't tried the custom function that Mike posted here, but I was in your same boat: I had to generate medians for the whole grade and for each class period within that grade for all subjects. I chose to run it through a looping script, doing the finds in the same table and the reporting out the median in a separate "Median" table.  It really doesnt take long, AND I set up my script to UPDATE median records that were created already. The first time the "Spring / Math / Period 4" median record was created, but subsequent runs of the script would simply update the median. The user can choose to update it whenever they run the printed-median report script.

                • 6. Re: How Can I Find the Median of a Dataset?
                  jbrown

                  Mike,

                  That would probably require another CF, wouldn't it (I think there are CFs out there that sort lists).

                   

                  I've not used this, but I did see it. I was interested in the challenge to create the process via a script. I wanted to get my hands dirty in it.

                  • 7. Re: How Can I Find the Median of a Dataset?
                    richardsrussell

                    Boy, I lit right up when I saw the words "custom function", because a function is exactly what I'm looking for.

                     

                    But, on closer examination, it appears that "ValueList" encompasses the entire table, when what I need is specifically the median for different subsets of data within a table (for example, when sorting by "Breed", separate medians for Labrador retrievers, poodles, German shepherds, chihuahuas, wire-haired terriers, Brittany spaniels, etc. — or, if sorting instead by "Sex", for male intact, male neutered, female intact, female spayed, and unknown). The ideal would be something that works just like the "Count" or "Average" summary fields, where they don't care WHAT you've sorted by.

                    • 8. Re: How Can I Find the Median of a Dataset?
                      jbrown

                      You could pass in the sorted found set of records, maybe with ExecuteSQL.

                      That would work pretty well I think.

                       

                      Using execute SQL find the numeric data you want for all records that have a breed of "Labrador", sorted ascending.

                       

                      But again, that would all have to be scripted: The breed names, the gender, etc. to put inside the executeSQL statement.

                      • 9. Re: How Can I Find the Median of a Dataset?
                        Mike_Mitchell

                        Richard -

                         

                        Another option you can try is to use a global field and a self-join. Set the global to the particular value you want (sequentially) - male intact, male neutered, etc. Extract the list of values using the List ( ) function, then pass that list through the custom function.

                         

                        Mike

                        • 10. Re: How Can I Find the Median of a Dataset?
                          kamar

                          Hi Richard,

                           

                          If you are open to using a plugin,  then my IDMA plugin   http://www.idma.co.nz/idmaplugin.php   has a 'Number Array' function which can be used to store each data set, then return the 'Median'  (as well as average/mean, SD, quartiles, etc) on each data set.

                           

                          With a bit of logic - you should be able to do all from one pass (or relationship) over your data set - then just pull out the median for each sub-group.

                           

                          Happy to go into more detail if you want.

                           

                          Kent.

                          • 11. Re: How Can I Find the Median of a Dataset?
                            richardsrussell

                            Thanks, Jeremy. This is what I ended up using. I created a separate global field to hold the result of the median-value calculation, so I'd be able to display it along with the data set's max, mean, min, and standard deviation — all statistics which FMP provides as standard features. The script itself was actually fairly easy to put together, which makes it even more puzzling that FMI wouldn't have just incorporated "Median" as a regular function.

                            • 12. Re: How Can I Find the Median of a Dataset?
                              jbrown

                              You're welcome Richard.

                              I've often wondered about the median function. The only thing I can think of is that since median is a stat that requires a sorted found-set list, their internal function would have to sort the list first. I can't think of another function that has to sort the list first before doing something to it. 

                              The script is a good thing. I'm also playing with the custom function shown here to see which performs better and under which circumstances.