7 Replies Latest reply on Oct 30, 2008 12:59 PM by Sorbsbuster

    Search Performance?

    jond

      Title

      Search Performance?

      Post

      I have a data set that needs to be regularly searched and having a performance issue with FMPro8.5v2Advanced.

       

      The data consist of 15 pairs of numbers and the way the matching script is currently working the first record (30 numbers) is copied to 30 different variables, then a find is run with the first pair of numbers and all records in the table that contain the first two numbers get their match score incremented by 1, next a show omitted is run and all records that do not have the first pair of numbers get the non-match score incremented by 1. This process is repeated for the other 14 pairs of numbers.  Finally a find is done for all records that have a match score above a given threshold which the user can vary depending on their needs.

       

      The problem is that with 4000 records in the database the matching process takes around 10 seconds per record or 11 hours.

       

      Does anyone know a faster way to accomplish this?

       

       

        • 1. Re: Search Performance?
          Sorbsbuster
            

          Hi JonD,

           

          I can't see anything in this that would immediately strike me as slow to perfrom, so I must be missing something.  Can you help break it down, please?

           

          Is this what you would do manually?

          - Do you mean that if you were doing this manually, you would search two fields in all the records (4000 is teensy-weensy in FM terms) for two particular numbers?

          - Then in the found set you would do a 'Replace', updating their existing match score by one.

          - Then  you do a 'Show omitted', and a 'Replace', incrementing their non-match score by one.

          - Then you do the same with a second pair of numbers, etc...

          - (I'll ignore the last bit for the minute)

           

          If this is wrong, could you correct me, please, and if it's right, could you narrow it down by doing one loop of this manually, and highlighting which bit is the slow bit?  A 'Find', a 'Show omitted', and two 'Calculated Replace's should be the twinkling of an eye.  Okay, *fifteen* eye-twinklings might start to add up, but 11 hours is a lifetime.

           

          Alan.

          • 2. Re: Search Performance?
            jond
              

            Hi Alan, you are correct that is what would be done manually.  I captured the time before and after one complete match and it took 10 seconds.  Each individual find, replace, show omitted and replace takes just under 1 second.  All the numbers are indexed and I tried searching with the match scores indexed or not indexed and it didn't seem to make a difference.  This is running on a 3.06 GHz iMac with 4 gigs or Ram and all filemaker settings are the defaults. 

             

            A Similar algorithm on SQL server, using select statements rather than finds, took about 15 seconds (for 4000 matches).

             

            Thanks

            JonD 

            • 3. Re: Search Performance?
              Sorbsbuster
                

              Sorry, I'm lost in the math here, a bit.

               

               "Each individual find, replace, show omitted and replace takes just under 1 second"  Does that mean the complete cycle of finding the matches for the first pair of numbers, updating the match score, finding the omitted, and updating the other score?  The time sounds about right for me.

               

              But then you are going to be doing that 15 times, which is 15 x 1 seconds = 15 seconds total.

               

              I don't understand  the other time you give: "...the time before and after one complete match and it took 10 seconds" (in that I don't see how the operations are different from those in the first scenario that took only 1 second).  But if even that was to be repeated 15 times it would still be over in 2.5 minutes.

               

              The only way I can get near 11 hours is to include a multipier of 4,000.  Is there any chance that in your (doubtless looping) script, you are doing the operation once for every record, as opposed to once to every record - in other words, 4,000 times?

               

              Another aspect that maybe supports this is that I was dubious about your implication that the record containing your search criteria was just another one of your 4,000 records to be updated.  I think I'd have isolated that record as just one (or 15) in a different table - 'Match Scores', for instance and then cycled through the script once for each record (15 max) in that table.

               

              Sorry if I'm being thick about this, but you are right to think it should be dead simple.

               

              Alan.

               

              • 4. Re: Search Performance?
                jond
                  

                Each record is 15 pairs of numbers and we need to search each record against all other so I need to compare each pair of numbers in the a given record against the same pair of numbers in all other records to look for possible matches.

                 

                Here is what part of the results would look like after the 3rd pair of numbers for record #1 is searched against the rest of the database.

                 

                                    Pair#1    Pair#2    Pair#3 ...

                Record name   A1 A2     B1 B2     C1 C2 ...   Match Score   NonMatch Score

                0001              10 12      6 14        7 11 ...

                0002              11 14      6  9         7 11 ...          1                  2

                0003              10 12      6 14        7 11 ...          3                  0

                ... 

                4000 

                 

                Each find, replace, show omitted and replace (on one of 15 pairs) takes approximately 0.65 seconds so performing it on all 15 pairs of numbers in one record takes approximately 10 seconds (what I was referring to as a complete match for one record).

                 

                Yes, the script needs to be performed on all 4000 records, that is what takes approximately 11 hours.  The end user will have a given record that they are interested in and they need to know if it matches any other records in the table.

                 

                If it would help I can include the code for the script?

                 

                Thanks

                JonD

                • 5. Re: Search Performance?
                  Sorbsbuster
                    

                  Sorry for missing that point, JonD.

                   

                  I don't think I will hit the nail on the head here, but maybe my mental stumblings might trigger something for you.

                   

                  For my clarification, do you mean that Pair#2 (say) in Record 0001 will only be compared against Pair#2 in all the other records, or against all Pairs in all the other records?

                   

                  Either way, my thinking is that you are trying to count how many matches for a field (eg: 'A1 A2') there are in the rest of the Table, then set a Match Score field to be that count (or some derivative of that count).  If that is the case, why not set up a relationship from the Table back to itself, relating the field Pair#1 to Pair#1?  (if it is to match any other of the 15 pairs, then set it up as an 'OR' relationship against all of the 15 Pair# fields.)  Then simply set the Match Score field as a calculation of (Count (SelfRelationshipForPair1:: MatchScore) - 1) (say).  I subtracted 1 because it will count itself, of course.

                   

                  Using a relationship to do the count on the fly removes any need for finding, showing omitted, etc., and will show the live result immediately.  If you want to trap the Match Score values for some reason you could just have a script run through the records and set the 'live' Match Score Value into a MatchScoreValueTEMP number field, which can then be indexed, sorted, searched, etc.

                   

                  I'm sure I haven't nailed it, but maybe you can use this?

                   

                  Alan.

                  • 6. Re: Search Performance?
                    jond
                      

                    Hi Alan, the numbers in pair #2 would only be matched against pair #2 in all other records but in order for two records to match then they need to be identical at atleast 12 of the 15 pairs of numbers. 

                     

                    I'm not sure I fully understand this approach but I think that score would indicate how many other records match the current record at one pair of numbers, is that correct?  Do you know how I could get from this count to the set of all records that match a given record at atleast 12 of the pairs of numbers?

                     

                    Thanks

                    JonD

                     

                     

                    • 7. Re: Search Performance?
                      Sorbsbuster
                        

                      Hmmm: that makes it a bit more tricky.

                       

                      Yes, you are right that it will indicate how many other records match that Pair - I didn't realise you wanted to add them the way you describe, sorry.

                       

                       

                      I can give you the equivalent of the OR relationship I mentioned; in other words, show you in each of the 4,000 records a portal that lists all of the other records that match any of the Pairs. But I don't know what you want to do with multiple matches.  For example:

                       

                      Suppose:

                       

                      Record A matches 12 times against records B and C.

                      And

                      Record A matches 14 times against records D and E and F

                       

                      Do you want a count of  '5' (because it has five other records that it matches at least 12 times)?

                       

                      BTW: I don't want to give you false hope, here: I don't want to suggest that if you tell me the answer to that question I can solve it!  But it is currently a confusion in my thinking.

                       

                      Sorry I can't be of more help,

                      Alan.