13 Replies Latest reply on May 4, 2017 12:44 PM by siplus

    CHALLENGE - Compare two lists (subtype: genetic differences)

    DrewTenenholz

      Hi All --

       

      Here's something I've been having trouble getting correct, and would like your help so I stop banging my head against the wall.

       

      Q: How many differences are there between two different individuals at a single genetic locus?

       

      ...at least that the question I'd like to answer, but I think this could also be generalized to use methods comparing two lists of values.  In my case, ListA & ListB will only ever have two values each, but I think any successful calculation would be able to compare lists of varying lengths (and even lists with different lengths).

       

      Below is a chart that explains what I'm trying to do.  It shows the alleles (values) at one locus for two individuals and the expected result.  Next to that are columns showing what happens when I use various ValueList comparison functions: FilterValues, and Ray Cologon's excellent XORValues and ZAPValues.  Unfortunately, none of these give the an answer that is always correct.

       

      Below the chart is a long calculation that you can throw into the data viewer which runs the nine separate tests and spits out readable results.  I hope that will help anyone willing to take on the challenge and find a successful result.

       

      Thanks for taking a look at this!

      -- Drew

       

      P.S.  In the end, i could work out a more 'manual'/'explicit' way of getting my answer, after all, there are only ever two values to compare to two others.  But, I'm hoping for something more efficient and elegant.

        

      List A (Individual A)List B (Individual B)
      TestHomologue1Homologue2Homologue1Homologue2Expected DifferencesFilterValuesXORValuesZAPValues
      1M1M1M1M102 ( should be 0 )successsuccess
      2M1M1M1M212 ( should be 1 )success0 ( should be 1 )
      3M1M1M2M112 ( should be 1 )success0 ( should be 1 )
      4M1M1M2M320 ( should be 2 )3 ( should be 2 )success
      5M1M2M1M11successsuccesssuccess
      6M1M2M1M202 ( should be 0 )successsuccess
      7M1M2M2M102 ( should be 0 )successsuccess
      8M1M2M2M31success2 ( should be 1 )success
      9M1M2M2.1bR57M320 ( should be 2 )4 ( should be 2 )success

       

       

       

       

      Let ( [

      ListA1=List("M1";"M1")

      ; ListB1=List("M1";"M1")

       

      ; ListA2=List("M1";"M1")

      ; ListB2=List("M1";"M2")

       

      ; ListA3=List("M1";"M1")

      ; ListB3=List("M2";"M1")

       

      ; ListA4=List("M1";"M1")

      ; ListB4=List("M2";"M3")

       

      ; ListA5=List("M1";"M2")

      ; ListB5=List("M1";"M1")

       

      ; ListA6=List("M1";"M2")

      ; ListB6=List("M1";"M2")

       

      ; ListA7=List("M1";"M2")

      ; ListB7=List("M2";"M1")

       

      ; ListA8=List("M1";"M2")

      ; ListB8=List("M2";"M3")

       

      ; ListA9=List("M1";"M2")

      ; ListB9=List("M2.1bR57";"M3")

       

      ; expectedResult1= 0

      ; expectedResult2= 1

      ; expectedResult3= 1

      ; expectedResult4= 2

      ; expectedResult5= 1

      ; expectedResult6= 0

      ; expectedResult7= 0

      ; expectedResult8= 1

      ; expectedResult9= 2

       

      ; test1= FilterValues ( ListA1 ; ListB1 )

      ; test2= FilterValues ( ListA2 ; ListB2 )

      ; test3= FilterValues ( ListA3 ; ListB3 )

      ; test4= FilterValues ( ListA4 ; ListB4 )

      ; test5= FilterValues ( ListA5 ; ListB5 )

      ; test6= FilterValues ( ListA6 ; ListB6 )

      ; test7= FilterValues ( ListA7 ; ListB7 )

      ; test8= FilterValues ( ListA8 ; ListB8 )

      ; test9= FilterValues ( ListA9 ; ListB9 )

       

      ; countTest1= ValueCount ( test1 )

      ; countTest2= ValueCount ( test2 )

      ; countTest3= ValueCount ( test3 )

      ; countTest4= ValueCount ( test4 )

      ; countTest5= ValueCount ( test5 )

      ; countTest6= ValueCount ( test6 )

      ; countTest7= ValueCount ( test7 )

      ; countTest8= ValueCount ( test8 )

      ; countTest9= ValueCount ( test9 )

       

      ; headlineResult= List (

        "test1= " & Case ( countTest1 = expectedResult1 ; "success" ; countTest1 & " ( should be " & expectedResult1 & " )" )

      ; "test2= " & Case ( countTest2 = expectedResult2 ; "success" ; countTest2 & " ( should be " & expectedResult2 & " )" )

      ; "test3= " & Case ( countTest3 = expectedResult3 ; "success" ; countTest3 & " ( should be " & expectedResult3 & " )" )

      ; "test4= " & Case ( countTest4 = expectedResult4 ; "success" ; countTest4 & " ( should be " & expectedResult4 & " )" )

      ; "test5= " & Case ( countTest5 = expectedResult5 ; "success" ; countTest5 & " ( should be " & expectedResult5 & " )" )

      ; "test6= " & Case ( countTest6 = expectedResult6 ; "success" ; countTest6 & " ( should be " & expectedResult6 & " )" )

      ; "test7= " & Case ( countTest7 = expectedResult7 ; "success" ; countTest7 & " ( should be " & expectedResult7 & " )" )

      ; "test8= " & Case ( countTest8 = expectedResult8 ; "success" ; countTest8 & " ( should be " & expectedResult8 & " )" )

      ; "test9= " & Case ( countTest9 = expectedResult9 ; "success" ; countTest9 & " ( should be " & expectedResult9 & " )" )

      )

       

      ; testResult= List (

        "test1= " & test1

      ; "test2= " & test2

      ; "test3= " & test3

      ; "test4= " & test4

      ; "test5= " & test5

      ; "test6= " & test6

      ; "test7= " & test7

      ; "test8= " & test8

      ; "test9= " & test9

      )

      ];

      List ( headlineResult ; "¶Individual Results:" ; testResult )

      )

        • 1. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
          fmpdude

          Could you perhaps give a simpler example with less data (a few rows) that still represents what you're doing?

           

          A spreadsheet with data "M1" "M2" is a bit confusing.

           

          A spreadsheet should use some real data and the exact operations you're doing (not FMP specific) with expected values (XOR or whatever).  If you "describe" what you're trying to do at every level, it will help all of us understand what's going on.

          • 2. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
            DrewTenenholz

            fmpdude --

             

            Sorry for any confusion.  The data in the spreadsheet is, in fact, the data I'm comparing.  The genetic locus in question here is pretty unimportant, but the possible alleles you have a M1, M2, etc. as shown.  And, of course, you get one copy of that gene from your mom and one from your dad.  Thus, you could be said to be M1/M1.  Another person could be M2/M1, which means you match on one allele and are different on the other.  You'll see here, that order is unimportant, one match is a match, and I don't really care which one of your M1 genes you call the match, it is the other one we call the "difference".

             

            The reason for the 8-9 tests, is that this is the real world of data I will see: some individuals are homozygous (mom & dad gave you the same allele) and others are heterozygous (different alleles from mom and dad).  When comparing two individuals at one locus, I want to know how many genes you have 'in common'.  (By the way, at this point I might mention that this is the sort of stuff used in organ transplantation to find a 'match'.  Folks do this comparison over 5 or more genes and figure out how close a match the donor is to the recipient.  Usually they don't ask what the specific differences are, they just want a count of the differences for 5+ genes.)

             

            So, column 5 above (expected differences) is what I need FileMaker to return to me.  And the nine tests listed are pretty much the essential tests to try for the eventual calculation, since these are the genotypes I'm seeing.

             

            I've been struggling with finding the correct language to even call this.  It's sort of like XOR and kind of like FilterValues, but not quite, which is probably why they don't give the right answer.  This is one of those things a human does nearly instantly and we struggle to put into 'computerese'.

             

            Maybe this helps,

            "If I'm M1/M1 at geneX and my donor is M1/M1 at geneX, I have NO differences."

            "If I'm M1/M1 at geneX and my donor is M1/M2 at geneX, I have  1 difference."

            "If I'm M1/M1 at geneX and my donor is M2/M1 at geneX, I have  1 difference."

            "If I'm M1/M1 at geneX and my donor is M2/M3 at geneX, I have  2 differences."

             

            which covers the case where I'm homozygous at geneX.

             

            When I'm heterozygous at geneX, the question changes a little:

            "If I'm M1/M2 at geneX and my donor is M1/M1 at geneX, I have  1 difference."

            "If I'm M1/M2 at geneX and my donor is M1/M2 at geneX, I have  NO difference."

            "If I'm M1/M2 at geneX and my donor is M2/M1 at geneX, I have  NO difference."

            "If I'm M1/M2 at geneX and my donor is M2/M3 at geneX, I have  1 difference."

             

            Basically, any human can nearly instantly do this calculation.  If I try to break it down into discrete steps, I:

            Look at IndividualA, allele 1 and see if it matches to allele one of IndividualB.  If so, I drop allele from BOTH IndividualA and Individual B.  If not, I compare to allele 2 and also drop if needed, otherwise, this is a 'difference'.  Repeat with allele2 using the remaining list of non-matched alleles from the first round.

             

            I could probably write a recursive CF to do this, I'm hoping the the correct application of existing tools can do this.

             

            -- Drew

             

            P.S.  It sort of has a 'bubble sort' feeling to it as well as the value comparisons....

            • 3. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
              siplus

              Here u are. Using arrays, no CF's. Just plain old Filemaker 6 stuff

               

              See attachment.

              • 4. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                fmpdude

                You're right, the calculations look easy.

                 

                Do you have an ERD or pssibly a representative sample FMP database with values?

                • 5. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                  DrewTenenholz

                  So, the database itself is way more complex.  Imagine lists of individuals, some flagged as donors, some as recipients, some donors linked to more than one recipient, and only some of the records contain the genetic data for this mis-match calculation, others have all sorts of data about the individuals.


                  I think I see how I might have been misleading.  I'm giving nine examples of the kind of comparisons I want to do, but I'll only do one for any given gene.  (I do have five genes though.....)  And, I'm looking to do this as a calculation, since that's what's going on in the database already.

                   

                  I have a pretty direct way to feed the calc. the information on the donor and the recipient, and need a mismatch score.  What I have now looks like what's below (for just one gene).  But, it's wrong, since it only compares the recipient homologue1 to the donor homologue1.  I can do as Siplus suggests, manually adding the comparison of the opposite pair of homologues, but that's not very generalizable to arrays with more than two members or arrays with non-equivalent sizes.

                   

                  Let ( [

                   

                  recipient_a1= Recipient::a1  // simplified for posting

                  ; recipient_a2= Recipient::a2

                   

                   

                  ; donor_a1= Donor::a1

                  ; donor_a2= Donor::a2

                   

                  ; locus1_mismatch_count=

                      Case ( recipient_a1 = donor_a1 ; 0 ; 1 )

                      + Case ( recipient_a2 = donor_a2 ; 0 ; 1 )

                   

                   

                  ; result= Case (

                          ; locus1_mismatch_count

                          )

                  ];

                  result

                  ) // end Let

                  • 6. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                    fmpdude

                    I'm not sure how you're using XOR, but you need to use it with binary operators or it won't be reliably correct.

                     

                     

                    Thus, if you say that "M1" =  0 and "M1" = 1 (or vice versa), your XORs should be correct.

                     

                    For example, in FMP "M1" XOR "M2" gives a zero which is not what we want but we're not using XOR correctly here.

                     

                     

                    Yet, "M" XOR "M1" gives a 1.

                     

                    Weird.

                     

                    But using binary arguments as XOR is really intended, you'll get the right result every time.

                     

                    That sort of looks like what your last posting is attempting to do.

                     

                    Be super careful with XOR. In Java, for example, it works with both bit operators and with Boolean operators.

                     

                    In Java, for example, XOR really works the way you want with Boolean arguments:

                     

                     

                    But, in Java, you can do bit math (I think in FMP, from my testing, bit math is not supported. It seems that FMP treats the 0 as false and 1 as true  -- meaning Boolean interpretation only.):

                     

                    (In Java...)

                    1 XOR 2  = 3  (not 1)


                    That's because this is really bit math:

                     

                           0001

                    xor 0010

                    -------------

                          0011 = 3

                     

                    --------

                     

                    In any case if siplus's posting is correct and what you want/need, please mark your posted question as "answered".

                    • 7. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                      siplus

                      DrewTenenholz wrote:

                       

                      I can do as Siplus suggests, manually adding the comparison of the opposite pair of homologues, but that's not very generalizable to arrays with more than two members or arrays with non-equivalent sizes.

                       

                      There's nothing to be done manually in my example, except entering the values in the 4 original arrays (but an array can be filled automatically from a list - not included in the example).

                       

                      Moreover, you said in your original post:

                       

                      In my case, ListA & ListB will only ever have two values each

                       

                      so the solution offered to your challenge glues to what you asserted. Of course it won't work with ternary or n-ary members, but it works with the offered info.

                      1 of 1 people found this helpful
                      • 8. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                        TomHays

                        For ListA and ListB each containing two values, the following function gives the expected differences for your cases.

                         

                        Let([

                        A1 = GetValue(ListA;1);

                        A2 = GetValue(ListA; 2);

                        B1 = GetValue(ListB; 1);

                        B2 = GetValue(ListB; 2)

                        ];

                        Case(

                           (ListA = ListB) or (ListA = B2 & "¶"& B1); 0;

                           IsEmpty(FilterValues(ListB; A1)) and IsEmpty(FilterValues(ListB; A2)); 2;

                           1

                        )

                        )

                         

                         

                        It is probably a good idea to use it as a custom function.

                         

                        -Tom

                        • 9. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                          TomHays

                          Modifying this to use the inputs provided in a previous post.

                           

                          -Tom

                           

                          Let ( [

                          A1= Recipient::a1;

                          A2= Recipient::a2;

                          B1= Donor::a1;

                          B2= Donor::a2;

                          ListA = A1 & "¶" & A2;

                          ListB =  B1 & "¶" & B2;

                          ];

                          Case(

                             (ListA = ListB) or (ListA = B2 & "¶"& B1); 0;

                             IsEmpty(FilterValues(ListB; A1)) and IsEmpty(FilterValues(ListB; A2)); 2;

                             1

                          )

                          )

                          • 10. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                            TomHays

                            To clarify the logic used for the calculation...

                             

                            // No difference if the sets are identical or identical when you reverse one.

                            // e.g. M1/M2 vs M1/M2 or M1/M2 vs M2/M1

                               (ListA = ListB) or (ListA = B2 & "¶"& B1); 0;

                             

                            //  2 differences if neither one of A is present in either position of B

                               IsEmpty(FilterValues(ListB; A1)) and IsEmpty(FilterValues(ListB; A2)); 2;

                             

                            // 1 difference is the only remaining situation.

                               1

                             

                            -Tom

                            • 11. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                              DrewTenenholz

                              I was trying Ray Cologon's excellent XORValues (http://www.briandunning.com/cf/39) which I use all the time for other things.  That's a very different function from the binary XOR built-in to FMP.  But, as it isn't helpful, I guess it's a distraction for this problem.  Same for Ray's ZapValues. (also a great function, just not here).

                              • 12. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                                DrewTenenholz

                                So, siplus' answer is to replace

                                 

                                ; locus1_mismatch_count=

                                    Case ( recipient_a1 = donor_a1 ; 0 ; 1 ) // mismatch from homologue1 to homologue1

                                    + Case ( recipient_a2 = donor_a2 ; 0 ; 1 ) // mismatch from homologue2 to homologue2

                                 

                                with

                                 

                                ; locus1_mismatch_count= Min (

                                    2 - ( recipient_a1 = donor_a1 ) + ( recipient_a2 = donor_a2 ) // mismatch from homologue1 to homologue1 +  homologue2 to homologue2

                                    ; 2 - ( recipient_a1 = donor_a2 ) + ( recipient_a2 = donor_a1 ) // mismatch from homologue1 to homologue2 +  homologue2 to homologue1

                                )

                                 

                                throughout the entire calculation.  It's not elegant, but it can work.

                                ... and Tom is doing the same sort of thing, comparing the two possible variants and decide what the mismatch is.  This is certainly one of those cases where 'the perfect is the enemy of the good'.

                                 

                                Sometimes I'm too much of a perfectionist....

                                • 13. Re: CHALLENGE - Compare two lists (subtype: genetic differences)
                                  siplus

                                  About turning a list into an array automatically via a calc.

                                   

                                  given a list named myList (let's say 10 elements max, but it will work the same with 1000 entries, just change the repetitions parameter) you can define myArray, a calculation with 10 repetitions (but as I said it can have 1000, it's up to you), defined as follows:

                                   

                                  Let([

                                  i = Get ( CalculationRepetitionNumber ) ;

                                  i1 = Extend(myList) ];

                                  GetValue(i1; i)

                                  )