1 2 Previous Next 17 Replies Latest reply on May 1, 2012 5:19 AM by datastride

    Fastest way to compare two text strings (case sensitive) ..

    datastride

      OK, if I want to compare two text strings for equality in a situation where case matters (i.e. is "abc" equal to "ABC"), I can use the Exact() function.

       

      But how do I check to see if the first string is less than or greater than the second where case matters?

       

      This test returns FALSE: "abc" < "ABC"

       

      This test returns FALSE: "abc" > "ABC"

       

      Both of the above return FALSE, because "abc" = "ABC" returns TRUE in FileMaker land ...

       

      And, of course, Exact( "abc" , "ABC" ) returns FALSE, as expected.

       

      So what is the analog to the Exact() function for comparing for less than or greater than with case sensitivity?

       

      Thanks for any guidance ...

        • 1. Re: Fastest way to compare two text strings (case sensitive) ..
          Malcolm

          how do I check to see if the first string is less than or greater than the second where case matters?

           

           

          code("a") > code("A")

           

           

          Malcolm

          • 2. Re: Fastest way to compare two text strings (case sensitive) ..
            BruceHerbach

            Morgan,

             

            What do you mean by greater then or less then?  I can see how you would be testing to see if string1 is equal, or not equal to string2. 

             

            Using code to get the numeric values may be a help, but may lead to some confusion in the implementation.  For instance code("A") is 65 while code("a") is 97.

             

            You can also use combined characters,  but they come out in reverse order..  For example.. A = 65,  B = 66 and C = 67  but code("ABC") returns

            670006600065

            So in short,  I don't think this lends itself to a simple > or < comparison.

             

            Bruce

            • 3. Re: Fastest way to compare two text strings (case sensitive) ..
              TomHays

              In your case-sensitive comparison you neglected to specify which is smaller "a" or "A".

              I am assuming you want "a" before (less than) "A". As Bruce pointed out the Code() function gives a lower value for "A" so it does require a bit of fiddling to make it work.

               

              I think a full solution requires a recursive custom function, but if you limit yourself to the first N letters you can build up a calculated string which you can use with the stock ">" or "<".

               

              For example to case-sensitively compare the first 5 letters of a string you can use this function to generate one of the comparison strings. (It can be extended to more letters by following the pattern.) It will have "A" < "a" due to the default behavior of the Case function. It is also limited to characters that have a code value that is only three digits (e.g. characters in the ASCII set)

               

              /* Using Let() to make things more legible. */

              Let(

              [

              theString = yourTextField;

              cod1 = Right("00" & Code(Middle(theString; 1, 1)); 3);

              cod2 = Right("00" & Code(Middle(theString; 2, 1)); 3);

              cod3 = Right("00" & Code(Middle(theString; 3, 1)); 3);

              cod4 = Right("00" & Code(Middle(theString; 4, 1)); 3);

              cod5 = Right("00" & Code(Middle(theString; 5, 1)); 3)

              ];

              cod1 & cod2 & cod3 & cod4 & cod5

              )

               

               

              To get "a" < "A" we can switch the case of the letters.

              I think this function will do what you need to generate a 5-character string that can be compared case-sensitively with "abcd" < "Abcd".

               

              Let(

              [

              theString = yourTextField;

              val1 = Middle(theString; 1, 1);

              let1 = If(Code(val1)> 96; Upper(val1); Lower(val1));

              cod1 = Right("00" & Code(let1); 3);

               

               

              val2 = Middle(theString; 2, 1);

              let2 = If(Code(val2)> 96; Upper(val2); Lower(val2));

              cod2 = Right("00" & Code(let2); 3);

               

               

              val3 = Middle(theString; 3, 1);

              let3 = If(Code(val3)> 96; Upper(val3); Lower(val3));

              cod3 = Right("00" & Code(let3); 3);

               

               

              val4 = Middle(theString; 4, 1);

              let4 = If(Code(val4)> 96; Upper(val4); Lower(val4));

              cod4 = Right("00" & Code(let4); 3);

               

               

              val5 = Middle(theString; 5, 1);

              let5 = If(Code(val5)> 96; Upper(val5); Lower(val5));

              cod5 = Right("00" & Code(let5); 3)

               

              ];

              cod1 & cod2 & cod3 & cod4 & cod5

              )

               

               

               

              -Tom

              • 4. Re: Fastest way to compare two text strings (case sensitive) ..
                TomHays

                If you want a string comparison that only considers the upper/lower case of the first letter and handles all unicode characters properly, you can try this one.

                 

                /* Function that compares strings with case-sensitivity only on first letter */

                /* -1: str1 is less than str2 */

                /* 0: str1 is equal to str2 */

                /* 1: str1 is greater than str2 */

                Let([

                str1 = yourTextField1;

                str2 = yourTextField2;

                let1 = Left(str1;1);

                let2 = Left(str2;1)

                ];

                Case(

                Exact(str1; str2); 0;

                str1 < str2; -1;

                str1 > str2; 1;

                 

                 

                /* At this point the two strings are equal except for upper/lower case somewhere. */

                 

                 

                /* If they both have the same case first letter, return 0 = equal */

                Exact(let1; let2) ; 0;

                 

                 

                Exact(Lower(let1); let1); -1;

                1

                )

                )

                 

                 

                 

                -Tom

                • 5. Re: Fastest way to compare two text strings (case sensitive) ..
                  datastride

                  Bruce and Tom,

                   

                  Thanks for your responses. I guess I didn't fully explain my challenge ...

                   

                  By "greater than" or "less than" I am referring to the result of comparing the entire contents of two text strings according to the ASCII value of each character (don't need to deal with unicode),

                   

                  In other words, I am speaking about a straight string comparison as it would be done in any procedural programming language.

                   

                  A bit of background: I need to download a list of UID (unique IDs) of all the email messages on a POP3 server, sort them into ascending order, and then logically "merge" them with a similar sorted list of all unique IDs of messages previously downloaded from that server, to determine which messages on the server are "new" (not previously downloaded).

                   

                  The unique IDs assigned by various POP3 servers use digits and upper and lowercase letters. Thus I need a VERY FAST way to compare any two text strings to determine which is greater than the orther (based on ASCII values of characters therein, as explained above). Thus I need to differentiate between "GmailId136645ahj134ul53" and "GmailId136645Ahj134ul53". I really don't care which is greater than the other. I simply want the natural sort order to prevail so a "merge" routine will work properly. (In terms of ASCII, all uppercase letters sort ahead of all lowercase letters so the 2nd example string is technically "greater than" the 1st, but this really is of no consequence in my situation.)

                   

                  A recursive custom function that examines each character in each string is just too slow as this operation needs to compare sometimes thousands of string pairs and do this again and again every few minutes.

                   

                  Of course the hardware FileMaker is running on (x86 CPUs) provides machine instructions to do exactly what I want to do. But apparently FileMaker does not provide access to a straight test for greater than or less than. "After the execution of the x86 CMPSB instruction, you can test the flags using the standard conditional jump instructions. This lets you check for equality, inequality, less than, greater than, etc."

                   

                  FileMaker tries to protect us from ourselves by doing a lot of extra work so that "A" is thought equal to "a" and "á" (letter a with acute acent, in case something gets mangled) and "E" is though equal to "e" and "é" (letter e with acute acent). Though very helpful in many (even most) situations, I would think there must be a native FileMaker mechanism to bypass all this extra work that both wastes CPU cycles AND results in the wrong result in my particular case.

                   

                  So do I have to write a plug-in just to have access to a straight string compare (with very fast performance)?

                   

                  I was hoping someone knew of a native solution that would provide very, very good performance ...

                  • 6. Re: Fastest way to compare two text strings (case sensitive) ..
                    TomHays

                    A bit of background: I need to download a list of UID (unique IDs) of all the email messages on a POP3 server, sort them into ascending order, and then logically "merge" them with a similar sorted list of all unique IDs of messages previously downloaded from that server, to determine which messages on the server are "new" (not previously downloaded).

                     

                    Are you sure you need to sort and merge to determine which are new?

                     

                    Can you not just use FileMaker's built-in tools for determining if the item is in the list, i.e. does it match another record in a relationship using the UID as a key?  You would first need to convert the UIDs into a format that works around FileMaker's case-insensitivity. Code(UID) will work fine for that purpose. The Code() function would only need to be executed once for each message and stay in a stored calc.

                     

                    As far as where to merge the new messages into a list, I don't think the UID is intended to convey the delivery (or sending) order of the messages, but I could be wrong. I think you have to use the message's timestamp for that.

                     

                    -Tom

                    • 7. Re: Fastest way to compare two text strings (case sensitive) ..
                      AlanStirling

                      Hi Morgan

                       

                      If you set the index and sort order of the UID field to 'Unicode', then any sorts or comparisons will use the full unicode set, without the issue of Upper and Lowercase letters being sorted together.

                       

                      Here is a link to the FileMaker 11 help page:

                       

                           http://www.filemaker.com/11help/html/non_toc.45.2.html

                       

                      Once set, then you should be able to ignore this issue and treat the UID's as unique.

                       

                      Best wishes - Alan Stirling, London UK

                      1 of 1 people found this helpful
                      • 8. Re: Fastest way to compare two text strings (case sensitive) ..
                        datastride

                        Tom,

                         

                        Thanks very much for taking so much time to respond. I am most grateful. But in my situation, I really need help with exactly that for which I was asking originally: namely, a way to compare two strings without all the massaging and aliasing that FileMaker's greater-than and less-than operators add to the mix.

                         

                        Here's the rub: Because I need to compare a list of possibly several thousand UIDs on a server with several thousand UIDs in a table in the database, I cannot afford the time to do a find on each UID on the server against the UIDs in the database. that's just way too slow. I am after something much more sophisticated that will run at least an order of magnitude faster.

                         

                        So I have chosen to download a list of UIDs of all messages on the POP3 server (something that happens very quickly, as POP3 defines a command to do this and the lovely folks at CNS have added this capability to their POP3it plug-in). Then I perform an in-memory sort (using a pair of custom functions). This sort usually happens very quickly because on many POP3 servers the UID are assigned "sequentially". OK, so now I have the list of server UIDs in sorted order in just a few seconds (literally 3 to 4 seconds for 2500 UIDs).

                         

                        Getting a list of the UIDs of previously downloaded messages is even easier. I just use the List() function pointed from my Email_Folders table to a Email_Unique_IDs table (with a sort specified for the relationship based on the UID field). Almost instantaneous to get a couple of thousand UIDs in sorted order.

                         

                        Next I perform some "matching" logic to step thru both lists together. When I find a UID in the server list that is not in the database list, wel then, that's a message I need to download.

                         

                        Why the need for speed? Because this script runs server-side and checks for new email for all users in the database. If we have 200 users in the database configured for email use, I have to be very, very sensitive to the time it takes to check for new messages for a single user ... because that time will be multiplied by 200. User #200 will then be waiting for new mail to be downloaded until we've finished checking for users #1 thru #199. Could be a very long wait if I use the "find" method ...

                         

                        Taking the approach described above has meant a process that could take over half an hour (when using the "find" approach you suggest), now takes around 10 seconds. And I hope to optimize the code to cut this more than in half.

                         

                        And I am not relying on the UID sort sequence to tell me the order in which messages should be downloaded ... only whether or not a particular message on the server is "new". (I actually use a second in-memory sort to re-order a short list of the UIDs of just the "new" messages back into the order they appear in the POP3 server's queue. But that's another story.)

                         

                        Sorry not to have provided more details before ...

                        • 9. Re: Fastest way to compare two text strings (case sensitive) ..
                          datastride

                          Alan,

                           

                          Good suggestions indeed ... Thanks for taking the time to respond. Sadly, doesn't completely solve my problem.

                           

                          If you read the reply I wrote to Tom, you'll have a more detailed view of the process I'm performing. And, yes, I can specify Unicode for the UID field of messages previously downloaded to the database. So this gives me one value list sorted properly.

                           

                          But I am using value lists (rather than the overhead of writing data to tables) of UIDs to implement a very fast "merge" style comparison loop. And I am (currently) relying on FileMaker's comparison operators (less-than and greater-than) in this "merge" loop. So this is why I need some (very fast) methodology for doing a "straight" comparison of the strings for conditions other than equality [for which I could usae the Exact() function].

                           

                          Oh, and I'm also using FileMaker's standard-issue comparison operators in my pair of custom functions that perform the in-memory sort of the UID list from the POP3 server (to prepare it for the merge operation). So I really need a stratight comparison mechanism in two places.

                           

                          Any other thoughts?

                          • 10. Re: Fastest way to compare two text strings (case sensitive) ..
                            TomHays

                            Hi Morgan,


                            Those details do provide a lot more clarity of your situation.

                             

                            If you can stand the overhead of importing the UIDs into a FileMaker table (hopefully less than 10 seconds), I think you can do

                            Show All Records (in your table of previously downloaded messages)

                            Show Related Records[Match all records in the current found set] to the imported records from the server.

                            (Catch errors in case there are no related records)

                            Show Omitted Only

                             

                            You should now have a list of those messages that are not in the previously downloaded set.

                             

                            Both tables would contain a calculated field Code(UID) which creates a case-sensitive value you can use as a match key to relate the tables. Calculating this stored calculation during import will cause a little speed hit. You can (and should) turn off indexing on other fields in that table.

                             

                             

                            -Tom

                            • 11. Re: Fastest way to compare two text strings (case sensitive) ..
                              TomHays

                              I put together a prototype do do this sequence with 6000 previously downloaded UIDs (randomly generated using the UID pattern you provided).

                               

                              It has 4000 new UIDs it imports into a second table from a text file to represent UIDs coming from the server.

                               

                              2000 of these new UIDs are already in the 6000 in the first table. 2000 of these are brand new.

                               

                              By finding related records using Code(Right(UID;16)) as the match key (to compare only the unique characters), it takes 1.8 seconds for the entire operation of importing and finding the 2000 brand new UIDs. (I had to to it 10 times in a loop to get sub-second resolution.)

                               

                              My test machine is a Mac Pro 2.66 Quad running FMP Advanced 11.

                               

                              From what you describe of your operation, the workflow change needed is to export the list of UIDs you get from the plug-in to a text file. This can be done by storing it in a global text field and using Export Field Contents. Including this step (for the list of 4000 records) adds almost no time and still keeps it at 1.8 seconds in my testing.

                               

                              -Tom

                              • 12. Re: Fastest way to compare two text strings (case sensitive) ..
                                johan

                                As previous posters have mentioned, the UID field should be indexed using Unicode. That way you won't have to perform the calculation Code(UID), which would save some time also.

                                • 13. Re: Fastest way to compare two text strings (case sensitive) ..
                                  TomHays

                                  You are right. Indexing as Unicode on the match key in the relationship to be case-sensitive. This eliminates the need for the use of Code().

                                  Sorry I didn't incorporate that earlier.

                                  • 14. Re: Fastest way to compare two text strings (case sensitive) ..
                                    datastride

                                    Tom,

                                     

                                     

                                     

                                    I just finished implementing the scheme you described, and I’m seeing times of 1 second (don’t have more accuracy available) to retrieve 2,000 UIDs from the POP3 server, export these to a text file, and then import into a scratch table in the database (with the UID field indexed).

                                     

                                     

                                     

                                    What can I say? You’re a genius.

                                     

                                     

                                     

                                    I owe you one … And Alan, too, for his suggestion to use Unicode for the UID fields.

                                     

                                     

                                     

                                    Thanks very much for all your help with this process …

                                     

                                     

                                     

                                    I guess I’ll be buying at least the first few rounds for both you gentlemen at DevCon …  

                                     

                                     

                                     

                                    Peace, love & brown rice,

                                     

                                    Morgan Jones

                                     

                                     

                                     

                                    FileMaker + Web:  Design, Develop & Deploy

                                     

                                    Certifications: FileMaker 9, 10 & 11

                                     

                                    <http://www.onepartharmony.com/> One Part Harmony 

                                     

                                    Austin, Texas • USA

                                     

                                    512-422-0611

                                    1 2 Previous Next