9 Replies Latest reply on May 28, 2016 1:10 AM by CamelCase_data

    Break a string in a value list of triplets

    psijmons

      I'm breaking my head over this but I can't get it to work properly and am sure this wheel has been invented before;

      I need to break up a string of DNA sequences into a value list of triplets

       

      GGATCCGCCACCATGAACGGCGACGACGCCTTCGCCAGACGGCCTACAGTGGGCGCCCAGATCCCCGAGAAGATCCAGAAGGCCTTCGACGATATC

      etc etc,

       

      should become:

      GGA

      TCC

      GCC

      ACC

      etc etc.

       

      Anyone with a clever idea for a CustomFunction?

        • 1. Re: Break a string in a value list of triplets
          siplus

          Function DNAList (input)

           

           

          Let (

          l = Length(input);

          Case (

                      l < 3; input;

                      List(Left(input;3); DNAList(Right(input; l-3)))

                )

          )

          • 2. Re: Break a string in a value list of triplets
            CamelCase_data

            As DNA sequences can be very long, a simple recursive function may not be the correct approach (the best you can get is 50,000 characters).

             

            I would start by approximating the longest DNA string you need to be able to handle, along with outlining the context in which you need to be able to do this, and then choose the method for parsing it based on that.

            In same cases, scripting with a loop can work - it's certainly the simplest method, capable of handling very long strings, though of course you can't use it in field definitions etc. Using JavaScript functions in a web viewer can be VERY fast, but also suffers from the inconvenience of not being part of the calculation engine.

            A custom function like CustomList https://www.briandunning.com/cf/881 will work for longer strings than a recursive function - if you tailor your own custom function based on the principles of CustomList, it should be possible to improve performance.

            There are of course also options of using shell scripting and/or plugins.

            • 3. Re: Break a string in a value list of triplets
              psijmons

              Briljant SIPLUS, just what I was looking for, thanks!

              You may want to post this at BrianDunning's site with the keyword DNA,

               

              @David: this is for peptides only, so the strings are not very long.

              • 5. Re: Break a string in a value list of triplets
                psijmons

                ah, thanks Beverly, I had forgotten about that site.

                Whenever I think of CF, my "burned in" reflex is briandunning, but fmfunctions is now back on my radar.

                • 6. Re: Break a string in a value list of triplets
                  TomHays

                  This variation will handle more than 510,000 characters in the sequence. (Tested with FMP11.)

                  The optimizations letting it handle larger sequences also serve to make this function faster than the simple function for the shorter sequences.

                   

                  Custom Function: StringToTripletList(string)

                   

                  Let(

                  [

                  l = Length(string);

                  d = Div(l;9);

                  bl = d*3;

                  r = Mod(l;9)

                  ];

                  Case(

                     l  ≤ 3; string;

                     l  ≤ 6; List(

                                  Left(string;3);

                                  Right(string; l-3)

                               );

                     l  ≤ 9; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Right(string; l - 2*3)

                               );

                     l  ≤ 12; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Middle(string; 2*3+1; 3);

                                  Right(string; l - 3*3)

                               );

                     l  ≤ 15; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Middle(string; 2*3+1; 3);

                                  Middle(string; 3*3+1; 3);

                                  Right(string; l - 4*3)

                               );

                     l  ≤ 21; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Middle(string; 2*3+1; 3);

                                  Middle(string; 3*3+1; 3);

                                  Middle(string; 4*3+1; 3);

                                  Middle(string; 5*3+1; 3);

                                  Right(string; l - 6*3)

                               );

                     l  ≤ 24; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Middle(string; 2*3+1; 3);

                                  Middle(string; 3*3+1; 3);

                                  Middle(string; 4*3+1; 3);

                                  Middle(string; 5*3+1; 3);

                                  Middle(string; 6*3+1; 3);

                                  Right(string; l - 7*3)

                               );

                     l  ≤ 27; List(

                                  Left(string;3);

                                  Middle(string; 1*3+1; 3);

                                  Middle(string; 2*3+1; 3);

                                  Middle(string; 3*3+1; 3);

                                  Middle(string; 4*3+1; 3);

                                  Middle(string; 5*3+1; 3);

                                  Middle(string; 6*3+1; 3);

                                  Middle(string; 7*3+1; 3);

                                  Right(string; l - 8*3)

                               );

                     List(

                        StringToTripletList(Left(string;bl));

                        StringToTripletList(Middle(string;bl+1;bl));

                        StringToTripletList(Middle(string;2*bl+1;bl));

                        StringToTripletList(Right(string;l - 3*bl))

                     )

                  ) // Case

                  ) // Let

                   

                   

                  -Tom

                  • 7. Re: Break a string in a value list of triplets
                    CamelCase_data

                    TomHays - that one looks very promising. Great example of optimization.

                    I think maybe you missed "18"?

                    • 8. Re: Break a string in a value list of triplets
                      TomHays

                      You are correct that the 18 was omitted unintentionally.  Thank you for spotting that.

                      Here is the corrected custom function.

                       

                      Note that this does not use tail recursion.  I tried tail recursion after this one but found that it silently gave a different answer for very large strings.  This function appears to be robust until reaching the recursion limit (somewhere between 585,000 and 611,000 characters) and  giving '?' afterwards.

                       

                      -Tom

                       

                      // StringToTripletList(string)

                      //

                      // Splits string into a list with 3 characters per line.

                      //

                      Let(

                      [

                      l = Length(string);

                      d = Div(l;9);

                      bl = d*3;

                      r = Mod(l;9)

                      ];

                      Case(

                         l  ≤ 3; string;

                       

                       

                         l  ≤ 6; List(

                                      Left(string;3);

                                      Right(string; l-3)

                                   );

                         l  ≤ 9; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Right(string; l - 2*3)

                                   );

                         l  ≤ 12; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Right(string; l - 3*3)

                                   );

                         l  ≤ 15; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Middle(string; 3*3+1; 3);

                                      Right(string; l - 4*3)

                                   );

                         l  ≤ 18; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Middle(string; 3*3+1; 3);

                                      Middle(string; 4*3+1; 3);

                                      Right(string; l - 5*3)

                                   );

                         l  ≤ 21; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Middle(string; 3*3+1; 3);

                                      Middle(string; 4*3+1; 3);

                                      Middle(string; 5*3+1; 3);

                                      Right(string; l - 6*3)

                                   );

                         l  ≤ 24; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Middle(string; 3*3+1; 3);

                                      Middle(string; 4*3+1; 3);

                                      Middle(string; 5*3+1; 3);

                                      Middle(string; 6*3+1; 3);

                                      Right(string; l - 7*3)

                                   );

                         l  ≤ 27; List(

                                      Left(string;3);

                                      Middle(string; 1*3+1; 3);

                                      Middle(string; 2*3+1; 3);

                                      Middle(string; 3*3+1; 3);

                                      Middle(string; 4*3+1; 3);

                                      Middle(string; 5*3+1; 3);

                                      Middle(string; 6*3+1; 3);

                                      Middle(string; 7*3+1; 3);

                                      Right(string; l - 8*3)

                                   );

                         List(

                            StringToTripletList(Left(string;bl));

                            StringToTripletList(Middle(string;bl+1;bl));

                            StringToTripletList(Middle(string;2*bl+1;bl));

                            StringToTripletList(Right(string;l - 3*bl))

                         )

                      ) // Case

                      ) // Let

                      • 9. Re: Break a string in a value list of triplets
                        CamelCase_data

                        TomHays - yes, it performed really well in my tests as well!