1 2 Previous Next 19 Replies Latest reply on Feb 27, 2012 2:37 PM by ocolamonici

    Can WIld Card be used in a Find/Replace?

    ocolamonici

      I need to set a $$Var to Get ( ActiveSelectionStart )& " to "&Get ( ActiveSelectionStart )+(Get ( ActiveSelectionSize )-1) after a search in which I need to use "any one character" (@) as wild card. I noticed that while Find/Replace selects the found string in the string, a regular Find, where I can use wildcards, does not. Is there a way to use a wildcard with Find/Replace?

      I have worked around this using a custom function to get the positions, however using a find replace would be easier and more accurate.

      Thanks in advance for your help.

       

      Oscar

        • 1. Re: Can WIld Card be used in a Find/Replace?
          Stephen Huston

          What are you trying to accomplish?

           

          ( I always get the willies when someone wants to use Find/Replace... )

          • 2. Re: Can WIld Card be used in a Find/Replace?
            ocolamonici

            Stephen,

            I am trying to find the correct match for  DNA sequences allowing the user to include "Ns" which refer to any nucleotide in the query.  In other words, the user enters a stretch of 15-20 bases, which could include Ns to search a database of genes.  The problem is that with DNA one is searching strings based on only 4 letter and therefore, it is possible to find short stretches repeated many times on the targets.  For that reason, I am not happy parsing the sequence from one extreme to the other since errors are possible (landing on the wrong stretch).  The use of a wildcard(s) within the context of the entire Query is more accurate.  If the regular Find (using @ instead of Ns) were to select the found string , everything would be perfect for me.

            Thanks for your help

            Oscar

            • 3. Re: Can WIld Card be used in a Find/Replace?
              mbraendle

              Two questions:

               

              • why are you not using a professional tool such as BLAST for nucleotide searches?
              • have you tried out menu Records > Replace Field Contents, which is much more powerful than Find/Replace?
              • 4. Re: Can WIld Card be used in a Find/Replace?
                ocolamonici

                The reason why I do not use BLAST is that I am just creating an app that searches for sequences stored within the app itself.  I just want to add a feature that searches for sequences using N (as any nucleotide) and retrieve th location within the gene.  The searches are working fine including Ns in a regular Find(the Ns arre substitute by @), however, it is hard to get the location (from XX nucleotide to YY nucleotide) within the target gene because the regular find does not allow you to retrieve the location of the found string.  I got the location finder to work using a cf however, it could be problematic because parsing from one end to the other of the target can land you in the wrong location.  For example, if the user enters ATCNGGTCNGGG, parsing from either end (i.e. ATCN) can land you in any stretch containing ATC in the sequences stored in the database.

                 

                I have not tried Replace Field Content however.  That is a good alternative.  Thanks for teh idea.

                Oscar

                • 5. Re: Can WIld Card be used in a Find/Replace?
                  comment

                  I am still puzzled regarding what you're trying to accomplish. Earlier you said:

                   

                  Oscar Colamonici wrote:

                   

                  If the regular Find (using @ instead of Ns) were to select the found string , everything would be perfect for me.

                   

                  I don't think this should be too difficult to do using a looping script or a custom function - though naturally, if more than occurrence of the search string is found, then you will need to use another recursion to highlight all of them at once (it cannot be done by selecting, since  Filemaker does not support non-contiguous selection, AFAIK).

                   

                  I am also curious how Replace Field Content can assist in finding.

                   

                  Message was edited by: Michael Horak

                  • 6. Re: Can WIld Card be used in a Find/Replace?
                    mbraendle

                    Michael,

                     

                    correct, Replace Field Content does not assist finding (just in helping highlighting the results, but a custom function would bet better for that). Sorry Oscar for sending you on the wrong track.  I thought rather of replacing repeating features of the string with a given pattern (calculation field preferrred over Replace Field Content). The resulting string can then be used for the find. E.g. if you have something like ATCCGGGGTCGA, the new string would be AT{C}2{G}4TCGA . (Or if you have something like GCGCGCCATG the new string would be {GC}3CATG. You can the search e.g. with *{G}4* which would match those parts that have exactly 4 G's in row and not 5 or more.

                     

                    Oscar, I would try to solve your problem differently, by splitting your sequences into small chunks, so-called n-grams. E.g. for 3-grams, the sequence ATCCGGGGTCGA would be split as

                    ATC CGG GGT CGA 

                    TCC GGG GTC GA    (shifted to left by 1 nucleotide)

                    CCG GGG TCG A      (shifted to left by 2 nucleotides)

                     

                    This circumvents one problem - the restriction for the length of the character sequence that is used in the index (100 characters), which before could be a problem if your sequences were longer than 100 characters - but introduces a new one, since one loses the sequence in which the n-grams are ordered for finds. Hence, one has to number the individual n-grams to reinsert order:

                     

                    1_ATC 2_CGG 3_GGT 4_CGA

                    1_TCC 2_GGG 3_GTC 4_GA

                    1_CCG 2_GGG 3_TCG 4_A

                    1_CGG 2_GGT 3_CGA

                    1_GGG 2_GTC 3_GA

                    .

                    .

                    .

                     

                    I would then store each of these n-gram fragments in a separate (related) table - for each line one record -  together with the key of the parent sequence and the line number of each fragment.

                    E.g. a sequence with 200 nucleotides will result in 200 fragment records.

                     

                    Similarly, you split the find string, e.g. GGGGT --> 1_GGG 2_GT ( or GGG@T --> 1_GGG 2_@T) and do the search in the fragment table. For the example above the result set will be the key of the parent and the record with the 5th line. If the searched fragment occurs multiple times you will get all the line records that match. And with the line number you immediately have the positions.

                     

                    (With regards to BLAST searching, I'm quite sure that it offers search features such as searching with wild cards or with repeating patterns such as {ATC}3. The link I gave also  includes the C++ and C code.)

                    1 of 1 people found this helpful
                    • 7. Re: Can WIld Card be used in a Find/Replace?
                      comment

                      I think it would be helpful to know what is the length of a sequence (typical and maximum) and what is the length of a search string (also typical and maximum).

                       

                      I am not sure indexing makes a difference here, since a *sequ@nce* search is not going to use the index anyway.

                      • 8. Re: Can WIld Card be used in a Find/Replace?
                        ocolamonici

                        Michael and Martin,

                        The typical length of the query string, for the purpose of this app, is probably between 10 and 30-35 characters.  The target can be of variable length since it would correspond to the sequence of a gene (from a couple of hundred to several thousand characters, typically very few longer than 10,000). Having only 1 N, it is not a problem and it's already solved.  The issue is more than one N.  Your suggestions gave me an idea. The Ns can be detected as occurences within the Query using PatternCount. I can make a custom function that searches from the begining of the Query to the first N and from the end of the Query to the last N.  I can look for these 2 sequences on the targets and when the number of characters that separated the 2 sequences on the target is the same as in the Query, it is very likely that I have landed on the right place on the target.

                        What do you think?

                        Best,

                        Oscar

                        • 9. Re: Can WIld Card be used in a Find/Replace?
                          comment

                          Would something like this work for you? I don't think it will win any speed contest*, but at least it should provide the expected result (for the first occurrence only).

                           

                           

                          ---

                          (*) Perhaps you should look into incorporating regex through a plugin or OS-level scripting.

                          • 10. Re: Can WIld Card be used in a Find/Replace?
                            mbraendle

                            That's why I had proposed the n-gram scheme, which partially can use the index for the n-grams that don't contain a @.

                             

                            BTW, there must be a mistake in your custom function for highlighting; carry out a find with "NNA", then you see that the wrong part is highlighted.

                            • 11. Re: Can WIld Card be used in a Find/Replace?
                              mbraendle

                              You might run into problems if you have more than 2 Ns.

                              • 12. Re: Can WIld Card be used in a Find/Replace?
                                comment

                                MartinBraendle wrote:


                                carry out a find with "NNA", then you see that the wrong part is highlighted.

                                 

                                "NNA" is not a valid search string here.

                                 

                                 

                                MartinBraendle wrote:

                                 

                                That's why I had proposed the n-gram scheme, which partially can use the index for the n-grams that don't contain a @.

                                 

                                I still don't see how this scheme leads to the required result.

                                • 13. Re: Can WIld Card be used in a Find/Replace?
                                  ocolamonici

                                  Martin,

                                  I am not sure if I missed something but it also works with NNA (which would be an uncommon search since usually we'd use 10 of more characters).  The issue with such a short string is that it will show the first occurrence.  Your model was interesting and I learned a lot from it.  Thanks!

                                   

                                  Michael,

                                   

                                  I tested it with your "made-up genes" and worked great, and therefore, It should work we real genes too (I will tested tomorrrow).

                                  I was trying to get this done for a coupleof days and I was getting only a partial result.  To arrive to your solution (or Martin's) would have taken me a week at least.

                                  I really appreciate the interest and help that experienced people like you two provide to newbies like me.

                                   

                                  Best,

                                   

                                  Oscar

                                  • 14. Re: Can WIld Card be used in a Find/Replace?
                                    comment

                                    Michael Horak wrote:

                                     

                                    "NNA" is not a valid search string here.

                                     

                                    Well, maybe it is - though I believe it would be a very rare requirement, and my CF does assume it won't happen. Perhaps I will take another look at it later.

                                     

                                     

                                    Oscar, the issue with "NNA" is not the length, but the fact that it begins with wild characters. As such, searching for it makes sense only to exclude the very beginning of the sequence.

                                    1 2 Previous Next