    How to mark redundant records



           Would appreciate if anyone could point me in the right direction.

           I have a simple database where each record contains a text, frequency and no. of word field , and what I want to do is either mark or delete the redundant phrases as below. Are there any built in functions for handling this in FM12?

           Text field                       Frequency  No. words
           The cat sat on the mat     6                   6

           The cat sat on the            6                    5

           The cat sat on                  6                    4

           The cat sat                       6                     3

        • 1. Re: How to mark redundant records

               and what I want to do is either mark or delete the redundant phrases as below.

               Can you give an example? I don't see any phrases marked or deleted as redundant. Do you mean that "The cat sat on the" is redundant because another record has "The Cat sat on the mat"?

          • 2. Re: How to mark redundant records

                 Yes, that's right. The first record should be retained, the others are redundant and thus should be marked.

                 However, if there were a record "The cat sat on the" with a frequency of 2 then that should not be marked or deleted as there might be another word following e.g. "stairs".


            • 3. Re: How to mark redundant records

                   So the following entries in red are redundant and the ones in black are not?

              The Cat

              The cat sat

                   The cat sat on the mat

                   The cat sat on the stairs

                   Do you have FileMaker Advanced? I can conceive of a simple custom function and a self join relationship that will successfully match values between records.

                   cfWordList ( Words )
                   Case ( WordCount ( Words ) < 2 ; Words ;
                              List ( LeftWords ( Words ; 1 ) ; cvWordList ( RightWords ( Words ; WordCount ( Words ) - 1 ) ) )

                   Then you can define a MultiValue key field, cPhraseMatchKey as: cfWordList ( YourTextIfeld )

                   Define a calculation field set to return number, cWordCount as WordCount ( YourTextField )--> I think you have this field already.

                   and use both fields in a relationship defined like this:

                   Yourtable::cPhrasematchKey = YourTable 2::cPhraseMatchKey AND
                   YourTable::cWordCount < YourTable 2::cWordCount

                   Note, if I have this correct, the second pair of match fields should allow "The Cat Sat" to match to "The Cat Sat On The Mat" but not the reverse.

                   This, BTW can be used to calculate the frequency and you can now find all records where the Frequency is greater than 1, use Go To Related Records to bring up a group of redundant phrase records where you can sort them by word count so that you keep the phrase with the most words but then mark or delete the rest.

              • 4. Re: How to mark redundant records

                     Hi Philmodjunk


                     Thanks again for your input. I think you have what I'm after. I already have the frequencies. And my definition of redundancy is where an identical  string of words is contained within another record with the same frequency.

                     So if I have a string of 25 words all with a frequency of 25,  the longest one is to be reatined all the others are to be marked for deletion.

                     However, if any of the strings have a frequency of less than 25, then they should not be marked for deletion.

                     Is this the type of work you do professionally? If so perhaps I can consult you professionally for a "budget" solution off list?



                • 5. Re: How to mark redundant records

                       You may click the icon to the left of my comments to send me a private message.