3 Replies Latest reply on Jun 2, 2016 2:38 PM by erolst

    How to find any of multiple values to evaluate companies

    ncbeach

      Hi- I have a table (Scraped Data) of data scraped from thousands of companies' websites (records) and concatenated into one field for each company

       

      I need to "query" the scraped data field for each company against lists of keywords that indicate what values to put into fields of a separate table of profiles on the companies (Company Profiles table) to show things like the products they sell, customers they serve, appropriateness of the site's content, etc.  Some of the keyword lists have hundreds of keywords. 

       

      Please see the attached slide showing what I'm trying to do.

       

      For example, if any keywords in the list indicating appropriateness of content (to detect profanity, sexual content, etc.) are found in the scraped data for a company, the field for content appropriateness in the profile of that company should read "Inappropriate Content". 

       

      If keywords indicating they sell apparel are found, then "Apparel" should be added to the field for product categories sold for the relevant company(ies)

       

      I've searched here, YouTube, etc., but am unable to figure out how I can "query" (or search/ find, etc.) the Scraped Data table against each table of keywords to determine what should go into each field within the Company Profiles and create output that can then be put into those profiles.

       

      Any guidance you can give me on how to do this would be really, really appreciated.  Thanks a lot!

        • 1. Re: How to find any of multiple values to evaluate companies
          erolst

          A basic approach to this (in pseudo-code):

           

          Read a list of keywords into a variable

          Read a keyword list ID into a variable

          # [ that could indicate keyword type ]

          Find relevant company records

          Go to Record [ first ]

          Loop

            # [ record loop ]

            If [ record is not marked processed for that keyword list ID ]

              Reset keywordCounter

              Read scraped text into variable

              Loop

                # [ keyword loop ]

                Get keyword for iteration

                If keyword in scraped text [ PatternCount(), or Position() ]

                  take action [ e.g. create related record in a CompanyKeyword table for keywordType and found keyword ]

                End if

                Exit Loop if [ keywordCounter = keywordCount ]

              End Loop

              Mark record as processed for keyword list ID

            End if

            Go to next record

          End Loop

           

          ncbeach wrote:

          For example, if any keywords in the list indicating appropriateness of content (to detect profanity, sexual content, etc.) are found in the scraped data for a company, the field for content appropriateness in the profile of that company should read "Inappropriate Content".

          You would be better off having a dedicated table related to Company into which you write records that indicate the (type of) keyword, and the company's status regarding that keyword. (As indicated in the sample code above.)

          • 2. Re: How to find any of multiple values to evaluate companies
            ncbeach

            Thanks! 

             

            I'm a newbie to FileMaker, so I'll need to read up and watch some videos on using loops, variables, etc. to hopefully figure this out and give it a try.  I really appreciate your time answering the question.

            • 3. Re: How to find any of multiple values to evaluate companies
              erolst

              ncbeach wrote:

              I'm a newbie to FileMaker, so I'll need to read up and watch some videos on using loops, variables, etc.

              Good idea.

               

              It can also be helpful to set up a small sample database with just a few fields (so you're not overwhelmed by details) and try stuff out.

               

              Don't hesitate to come here and ask for assistance if any concepts are (or remain) unclear