8 Replies Latest reply on Nov 25, 2012 12:50 AM by comment

    Attempting to extract data from table (Web Scraping)

    nwhawksfan

      Inthe attempt to prevent me throwing my MacBook out the window, I am posting this here.

       

      I am pretty new to Filemaker, and maybe trying to bite off more than I can chew...story of my ilfe..

       

      I have used the GetUrl script and pasted html data into a field called "url_g". In this HTML I am attempting to get some data out of 4 tables and put them in their respective fields.

       

      I have tried numerous scripts to accomplish this, and am really struggling. A sample of the HTML code is below:

       

       

      <tr class="ysprow1" align="right" height="16"><td class="yspscores" align="left"> 2010-11</td><td class="yspscores" align="left">ATL</td><td class="yspscores">77</td><td class="yspscores">34:17 </td><td class="yspscores">6.5</td><td class="yspscores">13.5</td><td class="yspscores">47.7</td><td> </td><td class="yspscores">0.7</td><td class="yspscores">2.0</td><td class="yspscores">33.1</td><td> </td><td class="yspscores">3.0</td><td class="yspscores">4.1</td><td class="yspscores">72.5</td><td> </td><td class="yspscores">1.7</td><td class="yspscores">6.8</td><td class="yspscores">8.5</td><td> </td><td class="yspscores">3.3</td><td class="yspscores">2.6</td><td class="yspscores">1.3</td><td class="yspscores">1.6</td><td class="yspscores">2.8</td><td class="yspscores">16.5</td><td> </td></tr>

       

      Every number in this code needs to be placed in a particular field. To start I was simply trying to parse this text to a separate field so I could attempt to parse each number.

       

      Now, I am here begging for help. Is this even possible? I watched a video of a somebody doing it, but he made assumptions I did not understand. I have googled "parse data" and most of the examples are very simple.

       

      What I have tried is to Set a variable at word "ysprow1" and "ysprow2" (the beginning of the next table). I then attempted set a variable using the Middle function. I ran the script and it crashed FM.

       

      Here is a copy of the script I was using last

       

       

      #Parse the Tables out of the Code

      #First Table
      Loop

      Go to Record/Request/Page[ First ]

      Set Variable [ $t1beg; Value:Position ( NBA Players::url_g; "ysprowl1" ; 1 ; 1 ) ]

      Set Variable [ $t1end; Value:Position ( NBA Players::url_g; "ysprowl2" ; 1 ; 1 ) ]

      Set Variable [ $table1; Value:Middle ( NBA Players::url_g; $t1beg ; $1tend-$t1beg ) ]

      Set Field [ NBA Players::url_l_table_1; $table1 ]

      Go to Record/Request/Page

      [ Next; Exit after last ]

      End Loop

       

      I know this is the part where experienced FM designers are laughing..I deserve it :) Any help would greatly be appreciated.

        • 1. Re: New, Lost, and Pulling my Hair out!
          comment

          nwhawksfan wrote:

           

          Every number in this code needs to be placed in a particular field. 

           

          I am not sure what these numbers represent, and I suspect they should be placed in separate records of a related table.

           

           

          In any case, you can extract a specific number using =

           

          Let ( [

          prefix = "<td class=\"yspscores\">" ;

          pos = Position ( HTML ; prefix ; 1 ;  $n ) ;

          start = pos + Length ( prefix ) ;

          end = Position ( HTML ; "</td>" ; start ; 1 )

          ] ;

          Case ( pos ;

          Middle ( HTML ; start ; end - start )

          )

          )

           

          where $n is a variable specifying which number to extract. Have your script loop, increasing $n by 1 at each iteration, until:

           

          $n > PatternCount ( HTML ; "<td class=\"yspscores\">" )

           

           

           

          ---

          P.S.

          Please come up with a more descriptive title for your thread.

          • 2. Re: New, Lost, and Pulling my Hair out!
            nwhawksfan

            Thank you for your reply.  I must be worse off than I thought with my knowledge of Filemaker.  I dont even know where to begin with this script.   I went to "Manage Scripts, clicked on the script I am working on, and viewed all scripts by name.   Which script do I use to make this happen?

             

            Sorry for the lack of knoweldge, and I appreciate your help/

            • 3. Re: New, Lost, and Pulling my Hair out!
              comment

              It's not a script, it's a calculation. You would use it inside a script, most likely within a Set Field [] step - see the attached example.

               

              ---

              Note: this is an intermediate task; you must be familiar with at least the basics of calculations and scripting.

               

              Message was edited by: Michael Horak

              • 4. Re: New, Lost, and Pulling my Hair out!
                nwhawksfan

                I am familiar with calculations and scripting, but I am learning as I use them.  I looked at your scripts in the attached files, my struggle is understanding what the script is doing.  Spefically:

                 

                Let ( [

                prefix = "<td class=\"yspscores\">" ;

                pos = Position ( $html ; prefix ; 1 ;  $n ) ;

                start = pos + Length ( prefix ) ;

                end = Position ( $html ; "</td>" ; start ; 1 )

                ] ;

                Middle ( $html ; start ; end - start )

                )

                 

                -So this is the parsing piece, could you explain what each line is doing?  I think I am going to need to run this calculation again so extract each of the numbers and place them in the appropriate field.

                 

                Set Variable [ $n; Value:$n + 1 ]
                Exit Loop If [ $n > PatternCount ( $html ; "<td class=\"yspscores\">" ) ]

                 

                I do not understand this at all.  What is the purpose?

                 

                I am sorry if this annoying, I am just trying to learn.  I have looked all over the internet to find help, read blogs, and I am just stuck when it comes to my learning this. 

                 

                Thanks for your help

                • 5. Re: New, Lost, and Pulling my Hair out!
                  nwhawksfan

                  I got it all figured out except one thing.  What is the purpose of the case in the string?  I have read about it, but still dont understand it in a practical sense.

                  • 6. Re: New, Lost, and Pulling my Hair out!
                    comment

                    nwhawksfan wrote:

                     

                    I got it all figured out...

                     

                    That's good. Reverse engineering is the best way to learn, IMHO.

                     

                     

                    nwhawksfan wrote:


                    ...except one thing.  What is the purpose of the case in the string? 

                     

                    I am not sure what you're referring to.

                    • 7. Re: New, Lost, and Pulling my Hair out!
                      jrenfrew

                      pos is a locally scoped variable insde the calculation which is the nth position of the prefix

                       

                      the Case statement is a way of stepping through multiple tests and when one is true then the next thing is returned or calculated. An IF is just a Case with only one step

                       

                      So Case ( a ; b ; c ; d ; e ) in English reads if a is true then give me b, if not then if c is true give me d, and if not then give me e. You dont need the last part if you want a NULL response

                       

                      So in this instance

                      If pos exists (i.e. the nth prefix has a position value) then can I have the middle of the string from the start to the end (these being also variables)

                       

                      You might find that each time you come across a command or calculation of which you are unsure, the FileMaker help will give you a very good explanation of its meaning and use. These are the basic building blocks of the scripting language and you would do well to explore them one at a time.

                      • 8. Re: New, Lost, and Pulling my Hair out!
                        comment

                        jrenfrew wrote:

                         

                        So in this instance

                        If pos exists (i.e. the nth prefix has a position value) then can I have the middle of the string from the start to the end

                         

                        Oh, is that "the case in the string"? I took that part out in the script, because if the $n-th occurrence of the prefix does not exist, the loop will already have exited.