1 2 Previous Next 17 Replies Latest reply on Aug 14, 2017 12:19 PM by TomHays

    Common words in field

    flamingvan

      Let's say I have a table named "answers" with a field named "answerText", and I collect survey data in this field.  To analyze the data, I want to display a list of the 10 most common words used in the text in this field.  For example I want a list like:

       

      dissatisfied       -- 1023

      happy               -- 826

      recommended  --721

      etc.

       

      I already built a solution, but I had to make a related table named "words".  Every time an answer record is created, a group of "word" records are created, one for each word in the answer.  Then I can just group (or subsummary) by word in a report.

       

      But I'm wondering if there's a better way?

        • 1. Re: Common words in field
          patricia

          This sounds like you are not controlling the word entry and they can enter free form. It you control the entry with a popup menu, you would not need the word table.

           

          paticia

          • 2. Re: Common words in field
            flamingvan

            Yes that's true but it's what's intended.  The input for the user is a textarea box so they can write paragraphs or whatever.  But I still want to break it down by word.

            • 3. Re: Common words in field
              patricia

              Then I can not think of a better way. Someone else may have one.

              Good Luck on your project.

               

              patricia

              • 4. Re: Common words in field
                fmpdude

                So Easy!!!


                I use a microservice and do a quick FMP INSERT FROM URL script step to the service.

                 

                EXAMPLE 1:

                Assuming these are the words in my field: The lazy lazy lazy brown cow jumped over the moon quickly

                 

                The service returns: {over=1, the=2, moon=1, lazy=3, quickly=1, jumped=1, cow=1, brown=1}

                 

                But, it's simple to make it return each of these as RETURN-delimited string, too, or format in any way needed

                 

                ----------------------------------------

                 

                EXAMPLE 2:

                From the standard Lorem example words:

                 

                "But I must explain to you how all this mistaken idea of denouncing pleasure and praising pain was born and I will give you a complete account of the system, and expound the actual teachings of the great explorer of the truth, the master-builder of human happiness. No one rejects, dislikes, or avoids pleasure itself, because it is pleasure, but because those who do not know how to pursue pleasure rationally encounter consequences that are extremely painful. Nor again is there anyone who loves or pursues or desires to obtain pain of itself, because it is pain, but because occasionally circumstances occur in which toil and pain can procure him some great pleasure. To take a trivial example, which of us ever undertakes laborious physical exercise, except to obtain some advantage from it? But who has any right to find fault with a man who chooses to enjoy a pleasure that has no annoying consequences, or one who avoids a pain that produces no resultant pleasure?"

                 

                The service returns (highest counts, first 10 ONLY) - sorted from highest to lowest

                pleasure=7

                to=7

                of=7

                a=5

                the=5

                who=5

                pain=5

                because=4

                but=4

                and=4

                 

                ---------------------------------

                 

                This particular service method also works with ALL CPU CORES so it's extremely (extremely) fast.

                This is the approach I use.

                 

                What do you think?

                • 5. Re: Common words in field
                  flamingvan

                  I think that seems pretty interesting.  I never heard of that script step before.  I'm definitely going to keep it in mind as an option.   Thanks!

                  • 6. Re: Common words in field
                    sccardais

                    Flamingvan:

                     

                    I have the same requirement and would love to know how you are addressing this. Every month, we receive about 500 customer satisfaction surveys (NPS) and customer-submitted product suggestions. All of these are submitted in free form which makes it very difficult - as you know - to accurately classify them consistently. Asking our customers to submit their ideas in a more structured format, as one person in this thread suggested, is not an option for us either.

                     

                    All of the ideas we receive go into an FMP database where they are manually processed to identify the Topic (or Topics) and Sentiment (Positive / Negative). It's extremely difficult to identify the Topics consistently. I'm sure you can relate.

                     

                    • Do you also want to track key phrases such as "Ease of Use", "Easy to Use","User Friendly", "Not easy to use", "Hard to Use" etc.?

                     

                    • What about tracking synonyms that should be counted together?
                      • "Posting Fees" = "Billing"
                      • epayments = credit card,
                      • etc.

                     

                         To accurately aggregate and priroritize customer topics, I think the ability to create your own customer dictionary of words, phases and synonyms is important.

                     

                    • What about ignoring certain "stop words" such as "to, the, and, a", etc.

                     

                    Have you seen and considered some of the text analytics tools out there such as RapidMiner, Knime or others?

                     

                    FMPDUDE mentioned a microservice but not by name. Is this a separate site that accepts submissions in the form of .csv files (or similar) that returns word counts? Can you share the name of this site?

                     

                    Would love to compare notes with you on this. It's a fascinating application.

                    • 7. Re: Common words in field
                      sccardais

                      FMPDUDE

                       

                      I'd like to learn more about this microservice if you can share.

                      • 8. Re: Common words in field
                        fmpdude

                        You're welcome.

                         

                        Remember, you do need to point the INSERT FROM URL to a little microservice you write (a URL as in an HTTP GET or POST), but to do what you want is only a couple lines of Java! Super easy. Optionally, uses all processor cores.

                         

                        F-A-S-T

                         

                        A field or a book. Less than one second.

                         

                        ---

                         

                        As a larger test, I downloaded a book at random: The Project Gutenberg EBook of In Spite of All, by Edna Lyall

                         

                        The code did the frequency count so quickly it wasn't worth measuring. Probably a tenth of a second or less.

                         

                        Here are the top 10 from that book:

                        the=9739

                        and=4808

                        to=4749

                        of=4210

                        a=3094

                        in=2679

                        he=2652

                        his=2171

                        that=2026

                        i=1954

                        • 9. Re: Common words in field
                          fmpdude

                          sccardais wrote:

                           

                          FMPDUDE mentioned a microservice but not by name. Is this a separate site that accepts submissions in the form of .csv files (or similar) that returns word counts? Can you share the name of this site?

                           

                          Would love to compare notes with you on this. It's a fascinating application.

                          I wrote the microservice. Very very easy.

                           

                          Literally, to do what the OP wants, is only a couple lines of code. And, using Java 8, you can do it all decoratively and use all processor cores. Having an "ignore" list is not difficult, either, if needed.

                           

                          With larger fields (beyond FMP's GET can handle), you'd use POST with INSERT FROM URL.

                           

                          Or, you could even use JDBC in a standalone Java program and read the FMP database directly, do the word frequency analysis, and write the results right back to another FMP field. Thus, using JDBC, you avoid INSERT FROM URL and just access the database directly.

                           

                          Or, a hybrid approach, would be to use a microservice to kick off a JDBC process which does what you need.

                           

                          Let me know your thoughts.

                          • 10. Re: Common words in field
                            sccardais

                            Let me know your thoughts.

                             

                            FMPDUDE

                             

                            Thanks for the fast reply. Regarding my thoughts … I don’t have the technical experience to implement what you describe but I can reiterate that I think the application described by the OP is fairly common and potentially very useful but it goes well beyond counting words.

                             

                            As you may know, the area of Data Analytics and Text Analytics is growing very rapidly. New companies seem to surface almost every day.  RapidMiner and Knime are two free but VERY powerful examples but there are many more.

                             

                            I’ve looked at several and even attended one of their conferences but concluded that the technical expertise is beyond me. But - the application is widespread and important in my opinion.

                             

                            According to research, 80% of the information received and saved by businesses arrives unstructured / free form. This includes surveys, customer suggestions, email correspondence, transcribed audio, etc.

                             

                            Because it’s so difficult to interpret free form text consistently, most companies don’t do it. It’s a tremendous waste and opportunity.

                             

                            I don’t know the OP’s specific application but our application would need more than a simple word count. I described some of the enhancements in my reply to the OP.

                             

                            - omit counts of “stop words” . Search “stop words list” and you’ll find several lists in several languages. here’s a link to one. http://www.lextek.com/manuals/onix/stopwords1.html <http://www.lextek.com/manuals/onix/stopwords1.html>

                             

                            - allow creation of custom dictionary of common words and phrases to count. The ability to define a custom dictionary is important for different industries or departments within the same company. e.g. Marketing and Development will have different dictionaries.

                             

                            - allow creation of a custom “thesaurus” for synonymous words and phrases. e.g. "Ease of Use" is the same as “Easy to Use” and should be counted as the same topic. Customer Support = Customer Service, etc. Different Orgs in different industries will want to create their own and update as they gain more experience with the system.

                             

                            All of these are intermediate steps towards the goals of

                             

                            - identifying and rank ordering the topic or topics mentioned in the text block. e.g. What is this comment about?

                             

                            - assigning sentiment score (positive or negative) to each.

                             

                            If text blocks could be automatically and accurately evaluated to identify topic and sentiment, it would allow companies to set priorities for Development. It would help Marketing develop better messaging, It would improve training for Sales and Customer Support. It would help the entire company.

                             

                            We’ve been doing this on a small scale manually for several years.  For example, we know that the "rate of software improvements” we introduce is one of the most frequently mentioned topics in our customer surveys and that the sentiment about our rate of improvements is overwhelmingly positive. This is just one example. We also know that Price is rarely mentioned - either positively or negatively. Customer Support is another commonly mentioned topic and sentiment is very positive.

                             

                            But, scoring this information manually isn’t scalable and it’s error prone. The same person might analyze the same comments differently from one week to the next.

                             

                            I am under no illusion that this is an easy task and I’m not expecting you to solve this problem! But please do if you can! I’m sure it is extremely difficult but I believe many companies have the same problem / need.

                             

                            So - those are my thoughts. Thanks for asking. Hope this helps.

                            • 11. Re: Common words in field
                              fmpdude

                              You might want to look at (free) libraries like "WEKA" for your needs.

                               

                              Weka 3 - Data Mining with Open Source Machine Learning Software in Java

                              • 12. Re: Common words in field
                                Orlando

                                If you want something that is native in FileMaker, I found a Custom Function on Brian Dunning's website a while back, FileMaker Custom Function:SummariseValues ( values ; separator ) , and started trying something similar to the OP's request to count occurrences of words in text.

                                 

                                It uses a recursive custom function that takes the first value, counts the number of occurrences of that value in the text then strips it out of the rest of the text and then move on to the next one.

                                 

                                I have attached a demo file I put quickly put together to show the function in action, and split the results into a separate table.

                                 

                                Unlike fmpdude suggestion, Its a recursive function so will hit a limit on the amount of text you can pass in, if you are processing allot of text or this does not perform as well as you like then a microservice will be ideal.

                                 

                                Hope this helps !

                                 

                                -- Orlando

                                • 13. Re: Common words in field
                                  sccardais

                                  FMPDUDE

                                   

                                  Please forgive this basic question but what is a "microservice" in FMP? How is it different from a custom function, for example?

                                  • 14. Re: Common words in field
                                    fmpdude

                                    FMP does not have microservices. But, you can communicate with a microservice (or other REST service) from FMP.

                                     

                                    A microservice is a small REST application that FMP can communicate with via its INSERT FROM URL script step. Since services are often written in a language like Java, there's little it can't do. Connecting to FMP or any other database is not difficult. Additionally, since services are "compiled", they're fast. Plus, it's all free, and any program that can access HTTP GETs or other HTTP verbs can use it.

                                     

                                    FMP's 'openness' with INSERT FROM URL is the major reason I keep using FMP.

                                     

                                    Doing serious parsing like you're talking about, with sub-second performance, may need to be done externally to FMP.  Remember, that with a service, you can parse text from not just FMP, but any application that can do a GET (Terminal, browser, ....).

                                     

                                    I'll be glad to answer any other questions you might have.

                                    1 2 Previous Next