13 Replies Latest reply on Oct 9, 2009 5:38 PM by raybaudi

    Is there a wild card for a text searchString?

    Hendrik

      Title

      Is there a wild card for a text searchString?

      Post

      Hi,

       

      I'm having problems using the following formula: 

       

      Position ( text ; searchString ; start ; occurrence )

       

      My goal is to search a text for a cetrain string, for example, I'd like to find the motif G*WE ( in which "*" is any character) in the following text: AGASFGWERFW, If tried serveral wildcards in the text (for example G*WE, G@WE) but none is working. 

       

      Anybody a clue about solving this? 

       

      Thanks! 

        • 1. Re: Is there a wild card for a text searchString?
          raybaudi
            

          Hi

           

          Take my doubt...

          Assuming what you proposed, what should be the result ?

           

          ( AGASFGWERFW do not contain any G*WE )

          • 2. Re: Is there a wild card for a text searchString?
            comment_1
              

            raybaudi wrote:
            AGASFGWERFW do not contain any G*WE 

            Actually, it does - twice:

             

            AGASFGWERFW

            AGASFGWERFW

             

             

            @ Hendrik:

             

            You can use wildcards in Find mode, but not in calculations. It might be possible to construct a calculation to simulate a wild card (with some limitations). Or you might consider a plugin that supports regular expressions. However, my guess would be that the best solution is to restructure your data.


            • 3. Re: Is there a wild card for a text searchString?
              Hendrik
                

              Hi,

               

              Thanks for thinking along.  

              indeed, I made a typing error.  IT should have been something like AAAAGAWEAAAAAA;  I'm looking for particular motif in a huge database with thousands of proteins (which are visualized as long text strings). 

               

              Cheers

              Hendirk 

              • 4. Re: Is there a wild card for a text searchString?
                ChadAdams
                  

                Depending on what you are trying to do I would tackle this slightly differently.  The Position function can't use the wild card like you have it there.  However if it did work the way you have it then one would expect it to return the position of the first G whenever WE follows the G in the text.  You can achieve that same result with this calculation.

                 

                 

                Let(

                [

                posG = Position ( "AGASFGWERFW" ; "G" ; 1 ; 1 );

                posWE = Position ( "AGASFGWERFW" ; "WE" ; 1 ; 1 )

                ];

                 

                Case( posWE > 0 and posG > 0 and (posG < posWE); posG)

                 

                )

                 

                • 5. Re: Is there a wild card for a text searchString?
                  raybaudi
                    

                  comment wrote:

                   

                  Actually, it does - twice:

                   

                  AGASFGWERFW

                  AGASFGWERFW


                  None of those has one char between G and WE

                   

                  ( but that was indeed the reason for my dubt )

                  • 6. Re: Is there a wild card for a text searchString?
                    comment_1
                      

                    raybaudi wrote:
                    None of those has one char between G and WE

                    No, but normally the * wildcard means zero or more characters.

                    • 7. Re: Is there a wild card for a text searchString?
                      raybaudi
                        

                      Hendrik wrote:

                       

                      IT should have been something like AAAAGAWEAAAAAA


                      Ok, but I need more examples...

                       

                      1) one * means one char ?

                      2) do you need sometimes to search for G**E ? ( or *AWE, or **WE )

                      3) are your search based each times on four chars ? ( or more ? )

                      4) do you wish to hilite ( with red color ? ) the found protein into the text ?

                       

                      The better you'll explain, the better will be the solution.


                      • 8. Re: Is there a wild card for a text searchString?
                        Hendrik
                          

                        @Daniele

                         

                        1) one * means 1 character

                        2) I only need to search for the G(any single character)WE motif 

                        --> So If we take two examples strings AAAGQWEAAAAEAS  and MMASGAGTWEAAES the outputs should be 4 and 7

                         

                        3) My search strings are not every time based on four chars, but also on motifs like "G@W@L", but I believe such searches can be easily performed by adjusting parameters needed for the search under "2)"

                         

                        Hendrik 

                        • 9. Re: Is there a wild card for a text searchString?
                          Hendrik
                            

                          @ Daniele:

                          4) No a red colour in the text is not necessary; only the first occurence of the motif should be returned by the formula

                          • 10. Re: Is there a wild card for a text searchString?
                            ChadAdams
                              

                            Hendrik wrote:

                            @Daniele

                             

                            1) one * means 1 character

                            2) I only need to search for the G(any single character)WE motif 

                            --> So If we take two examples strings AAAGQWEAAAAEAS  and MMASGAGTWEAAES the outputs should be 4 and 7

                             

                            3) My search strings are not every time based on four chars, but also on motifs like "G@W@L", but I believe such searches can be easily performed by adjusting parameters needed for the search under "2)"

                             

                            Hendrik 


                              
                             

                            In that case modify my calc from above to something like this; 

                             

                            Let(

                            [

                            posG = Position ( "AGASFGWERFW" ; "G" ; 1 ; 1 );

                            posWE = Position ( "AGASFGWERFW" ; "WE" ; 1 ; 1 )

                            ];

                             

                            Case( posWE > 0 and posG > 0 and (posWE - posG = 2); posG)

                             

                            ) 

                             

                            Or you could do something like:

                             

                            Let(

                            [

                            txtToCheck = Middle ( "AGASFGWERFW" ; Position ( "AGASFGWERFW" ; "G" ; 1 ; 1 ) ; 4 );

                            g = Left(txtToCheck, 1);

                            we = Right(txtToCheck, 2)

                            ];

                             

                            Case( g = "G" and we = "WE", "Flag On", "Flag Off")

                             

                            )

                             

                            The point being that you should be able to get where you want to go by using the text parsing functions.

                             

                            --

                            Chad Adams

                            chad.adams@skeletonkey.com 

                             

                             


                            • 11. Re: Is there a wild card for a text searchString?
                              philmodjunk
                                

                              I'm assuming here that the letters are always upper case and only letters of the english alphabet...

                               

                              26 calculations set to return text:

                              ltrA : If (position (yourMotiffield ; "A" ; 1; 1) ; "A" ; "" )

                              ltrB : If (position (yourMotiffield ; "B" ; 1; 1) ; "B" ; "" )

                               

                              Use a substitute function to turn all letters not in Motif into placeholder @ symbols:

                               

                              SubbedText: Subsitute (ProteinPatternField ; [ltrA ; "@"] ; [LtrB ; "@"] ; ... complete the series ... [ltrz ; "@"])

                               

                              Now your position function will return the number you are looking for:

                               

                              Position (SubbedText ; YourMotifField ; 1 ; 1)

                               

                              This is pretty ugly, but it should work. Perhaps Comment has a more elegant approach?

                              • 12. Re: Is there a wild card for a text searchString?
                                comment_1
                                  

                                Hendrik wrote:

                                 

                                My search strings are not every time based on four chars, but also on motifs like "G@W@L", but I believe such searches can be easily performed by adjusting parameters needed for the search under "2)"


                                I am afraid that is far from truth. The calculation is pretty complex even with one single-character wildcard. For example, to find the pattern "A@B" you must find the first occurence of "A" and test if the character after next is "B". If not, you must proceed to the next occurrence of "A" and repeat the test - which means this requires either a recursive custom function or a scripted loop.

                                If the number of wildcards is not known in advance (or at least limited to a very low number), you will need another (inner) loop just to perform the test. If, for example, the pattern is "A@B@C", you must find an occurrence of "A", skip the wildcard, test the character, skip the next wildcard, etc...

                                A third (outer) loop will be required if you want to find the n-th occurrence of a match.


                                As Daniele said, it would be best to formulate the requirements as precisely as possible. Because every little nuance could make this significantly simpler - or even more complex.




                                • 13. Re: Is there a wild card for a text searchString?
                                  raybaudi
                                    

                                  Comment is right.

                                   

                                  So we start with 3 fields:

                                   

                                  Protein sequence <-- text field
                                  Motif <-- text field
                                  Position <-- text field

                                   

                                  and create a script ( note that this is good only for @AAAAA and A@AAAAA motives ) like this:

                                   

                                   


                                  Set Field [ Proteins:: Position; "" ]
                                  Set Variable [ $wild1pos; Value: Position ( Proteins:: Motif ; "@" ; 1 ; 1 ) ]
                                  Set Variable [ $wild2pos; Value: Position ( Proteins:: Motif ; "@" ; 1 ; 2 ) ]
                                  # motif:@AAAAA
                                  If [ $wild1pos = 1 and not $wild2pos ]
                                  Set Field [ Proteins:: Position; Position ( Replace ( Proteins:: Protein sequence ; 1 ; 1 ; " " ) ; Right ( Proteins:: Motif ; Length ( Proteins:: Motif ) - 1 ) ; 1 ; 1 ) ]
                                  End If
                                  # motif:A@AAAAA
                                  If [ $wild1pos = 2 and not $wild2pos ]
                                  Set Variable [ $char1; Value: Left ( Proteins:: Motif ; 1 ) ]
                                  Set Variable [ $rest; Value: Right ( Proteins:: Motif ; Length ( Proteins:: Motif ) - 2 ) ]
                                  Set Variable [ $i; Value: 1 ]
                                  Loop
                                  If [ Middle ( Proteins:: Protein sequence ; $i  ; 1 ) = $char1 and Middle ( Proteins:: Protein sequence ; $i + 2 ; Length ( Proteins:: Motif ) - 2 ) = $rest ]
                                  Set Field [ Proteins:: Position; $i ]
                                  End If
                                  Set Variable [ $i; Value: $i + 1 ]
                                  Exit Loop If [ not IsEmpty ( Proteins:: Position )  or $i > Length ( Proteins:: Protein sequence ) ]
                                  End Loop
                                  End If


                                   

                                  You can add more motives.

                                   

                                  Edit: correction of a variable name