1 2 Previous Next 17 Replies Latest reply on May 2, 2017 2:27 PM by pjohnson64

    Gender conversion

    pjohnson64

      I am using web direct and trying to replace the "find replace" script step. I am trying to use set field with "substitute". The problem I am having is that substitute does not look at whole words: he -> she (helium -> shelium). Is there away to make the substitute only look at whole words instead of parts. I tried spaces such as " He " and "He " but this does not work in the case that "He" is the first word in the field, but almost works. Any advice would be appreciated.

        • 1. Re: Gender conversion
          aknudsen

          Use ==he that will match the whole field

          • 2. Re: Gender conversion
            beverly

            powerful find operators can be used manually and in scripted finds:

             

            Exact() is another function to match (case-sensitive)

            beverly

            • 3. Re: Gender conversion
              TomHays

              pjohnson64 wrote:

               

              I tried spaces such as " He " and "He " but this does not work in the case that "He" is the first word in the field, but almost works.

               

              You could prefix the field with a space to deal with that situation specifically.

              Substitute(" " & myField; " He "; " She ")

              You could wrap that expression in Trim() to remove the leading space that was added in situations where it is ok to remove any preexisting leading and trailing spaces. Or you can more surgically remove the leading space With careful use of Let(), Middle() or Right(), and Length().

               

              Again you will find that this will work well when "He" is the first word in the field and followed by a space, but on closer inspection there can be additional situations where it will fail.

               

              "He, knowing that the train is coming, jumped aside." (not substituted due to comma instead of space separating word)

               

              "In that case he and his brother both jumped aside." becomes

              "In that case She and his brother both jumped aside." which has two one flawsSubstitute is not case-sensitive and will find "he" when you are looking for "He".  Also "his" is not changed to "her".

              EDIT:  Substitute() IS case-sensitive.  Substitute() will not find "he" when you are looking for "He".

               

              Natural language processing is very tricky.  If the field contains something more restrictive than full paragraphs of text, the calculation doesn't have to be as complex.

              Can you narrow down the problem space a bit by specifying any limitations there are to the text in the field?

              For example

              Does "He" appear only once in the field (one once in each line in the field)?

              Is a space the only possible word delimiter?

              Is "He" always capitalized in the field?

               

              Are you using FileMaker Pro Advanced and thus have access to using Custom Functions?

               

              -Tom

              • 4. Re: Gender conversion
                Tom_Droz

                I think if you run the special case separately you will be good.  If "He " is in position 1-3 is, substitute "She "

                • 5. Re: Gender conversion
                  TomHays

                  If you have FileMaker Pro Advanced, the adding the following recursive custom function should allow you to do the replacement in a calculation.

                  It should easily handle text that contains less than 1000 words.  I think it will be proper tail recursion so the hard limit on how many words depends on the app (FileMaker Pro, FileMaker Go, etc).  With FileMaker Pro, the hard limit may be as high as 50,000 words.  (Note that it recuses over words and not characters.)

                   

                  FindAndReplaceExactWord(myText; "He"; "She"; "")

                   

                  will find and replace all occurrences of "He" with "She" only if "He" is a separate word regardless of its position in the text.  It will respect upper and lower case to match only "He" and not "he".

                   

                  FindAndReplaceExactWord(myText; "he"; "she"; "")

                  will find only the lowercase "he" and replace with "she".

                   

                  -Tom

                   

                   

                  /* FindAndReplaceExactWord (text; oldword; newword, empty)

                  Invoke as FindAndReplaceExactWord (text; oldword; newword, "")

                  The last argument is used internally.

                  The search is case-sensitive.

                  The value of oldword will only be replaced if it is a distinct word.

                  */

                  Case(

                  IsEmpty(text); "";

                  IsEmpty(oldWord); "";

                  WordCount(text) = 0; text;

                  Let([

                  nextWord = LeftWords( text; 1);

                  nextPos = Position(text; nextWord; 1; 1) + Length(nextWord);

                  wordText = Middle(text; 1; nextPos -1);

                  postText = Right(text; Length(text) - nextPos + 1);

                  newwordText = Case(

                     Exact(nextWord; oldword); Substitute(wordText; [oldword;newword] );

                     wordText

                     ) // end Case

                  ];

                  Case(

                     WordCount(postText) > 0;

                        FindAndReplaceExactWord(postText; oldword; newword; empty & newwordText);

                     empty & newwordText & postText

                  )

                  ) // End Let

                  ) // End Case

                  1 of 1 people found this helpful
                  • 6. Re: Gender conversion
                    pjohnson64

                    I'm not sure that these will work in that it is not matching a single word in a field but a string of text in a field and there could also be multiple occurrences of the word in the string. could the "=" or " "==" be added to substitute ( Field ; "He" ; "She" ) such as substitute ( Field ; "=He" ; "She" )?

                    • 7. Re: Gender conversion
                      TomHays

                      No, the "=" or "==" operators are only for entering into a field in Find mode.  The Substitute() argument is a simple text replacement that matches every occurrence and is not case-sensitive.

                      EDIT: Substitute() is case-sensitive.

                       

                      In order to use Substitute() as you require, you have to be very explicit about what string you are replacing and handle upper and lower case on your own.

                       

                      -Tom

                      • 8. Re: Gender conversion
                        pjohnson64

                        Tom, the custom function works great! Thank you very much. I am going to stare at it for awhile until I understand why it works. Thanks again.

                        • 9. Re: Gender conversion
                          siplus

                          So you tried spaces, and that's good. You hit the wall for some situations, i.e. when "He" is the first word, and probably when "He" is the last word. Plus some punctuation.

                           

                          Try the following in data viewer:

                           

                           

                          Let ([

                           

                          myText = "He. He he he. He said.¶He wants. He; he and he. He's a she. He: the last word. He!. Oh, he? Yes, hey, it's him, he.";

                           

                          firstTransformation = " " & substitute(myText; [ "."; " ||punct||" ]; [ "'"; " ||apost||" ]; [ "!"; " ||Exclam||" ]; ["?"; " ||Ask||"]; [¶; " ||para|| "]; [";"; " ||semic||"]; [":" ; " ||colon||"]);

                           

                          secondTransformation = Substitute(firstTransformation; [" he "; " she "]; [" He "; " She "]);

                           

                          thirdTransformation = Substitute(secondTransformation; [" he "; " she "]; [" He "; " She "]);

                           

                          fourthTransformation = Substitute(thirdTransformation; [" ||punct||"; "." ]; [" ||apost||"; "'" ]; [" ||Exclam||" ; "!"]; [" ||Ask||"; "?"]; [ " ||para|| "; "¶"]; [" ||semic||"; ";"]; [" ||colon||"; ":"])

                           

                          ];

                           

                          Trim(fourthTransformation)

                           

                          )

                          • 10. Re: Gender conversion
                            TomHays

                            Motivated by the post of siplus to reexamine, I discovered that I was in error when asserting in my previous posts that Substitute() did not take into account the upper or lowercase of the search string.

                             

                            Substitute(" He is a he and not a she "; [" He "; " She "])

                            will yield " She is a he and not a she " with the lowercase "he" untouched.

                             

                            -Tom

                            • 12. Re: Gender conversion
                              pjohnson64

                              I made a set field occurrence for each version as it was already setup I just had to add the custom function. It seems to work with the tests I have tried.

                              • 13. Re: Gender conversion
                                pjohnson64

                                I read that article and it does not seem to apply as I am looking to replace full words and not a sections of a word. I have tried several tests and it seems to be working with the custom function that Tom posted. I am going to give your post a try to see how that performs. It is awesome to see the different  methods and work to grow my understanding. Thank you.

                                • 14. Re: Gender conversion
                                  TomHays

                                  Here is an improvement of the custom function which iterates over words with string matches found with Position() instead of iterating over every word in the text.  This should be very efficient in recursion for large input text strings with few matches so it easily handles text over 50,000 words.

                                  In the pathological worst case scenario of a single-letter value of oldword appearing in every word, this will revert to the performance of iterating over individual words.

                                   

                                  -Tom

                                   

                                  /* FindAndReplaceExactWord (text; oldword; newword, empty)

                                  Recursive custom function.

                                  Invoke as FindAndReplaceExactWord (text; oldword; newword, "")

                                  The last argument is used internally.

                                  The search and replace is case-sensitive.

                                  The value of oldword will only be replaced if it is a distinct word.

                                  This iterates over potential matches found by Position() so it is limited in recursion by the number of matches.

                                  Worst case scenario of a match found in every word, the recursion will happen for every word in the text.

                                  */

                                  Case(

                                  IsEmpty(text); "";

                                  IsEmpty(oldWord); "";

                                  WordCount(text) = 0; text;

                                  Let([

                                  rawMatchPosnStart = Position(text; oldword; 1; 1);

                                  /* If the match is ultimately good, then rawWordBoundaryStart should be at the word delimiter

                                  or start of the matched word if it was located at the beginning of the text.

                                  This is where the text will be divided on this iteration. */

                                  rawWordBoundaryStart =

                                     Case(

                                        rawMatchPosnStart = 1; rawMatchPosnStart;

                                        rawMatchPosnStart - 1

                                     );

                                  preText = Left(text; rawWordBoundaryStart - 1);

                                  // remainingText = Right(text; Length(text) - rawWordBoundaryStart + 1);

                                  rawMatchWord = LeftWords( Right(text; Length(text) - rawWordBoundaryStart + 1); 1); // This is the full word that contains the raw match to oldword.

                                  // This is is the character position immediately after the rawMatchWord. This is the unprocessed portion.

                                  afterRawMatchWordPos = Position(text; rawMatchWord; rawWordBoundaryStart; 1) + Length(rawMatchWord);

                                  postText = Right(text; Length(text) - afterRawMatchWordPos + 1);

                                  // This is the full text sequence that may contain the matched word and a word delimiter.

                                  // It is isolated from preText and postText.

                                  rawWordText = Middle(text; rawWordBoundaryStart; afterRawMatchWordPos - rawWordBoundaryStart);

                                  newwordText =  Case(

                                     Exact(rawMatchWord; oldword); Substitute(rawWordText; [oldword;newword] );

                                     rawWordText // There was no exact match so leave this text alone.

                                     ) // end Case

                                  ];

                                  Case(

                                     rawMatchPosnStart = 0;    empty & text;  // The raw sequence was not found text.

                                     WordCount(postText) > 0; // There are still unexamined words after finding and dealing with a raw match.

                                        FindAndReplaceExactWord(postText; oldword; newword; empty & preText & newwordText);

                                     empty & preText & newwordText & postText // No new words to process.

                                  )

                                  ) // End Let

                                  ) // End Case

                                  1 2 Previous Next