1 2 Previous Next 16 Replies Latest reply on Jun 13, 2017 2:00 PM by sam_oda

    Scrapping And Parsing Website Data

    stevestearns

      I am trying to figure out the best method of scrapping (Insert from URL) and parsing a website's data contained in a table (there are multiple tables in the web page (see attachment)) to parse out the required data (i.e., Year, Make, Model, Trim Level, Style, Overall Height, Overall Length, and Overall Width).

       

      I know I can use the Position and Middle functions to parse the data, but I am struggling with how to find the text and extract only what I need. For instance find to ">YEAR</td>" and then find the next </td> occurrence and grab what's between the ">" and the "</td>" equals "2010" in this example. The source code is in a flattened state, rather than with (carriage) returns after each </tr> as shown below.

       

      A snippet of the HTML source code is:

       

      <tr><td style="width:40%;">VIN</td><td style="width:60%;">3FAHP0JG6AR413610</td></tr>

      <tr><td style="width:40%;">YEAR</td><td style="width:60%;">2010</td></tr>

      <tr><td style="width:40%;">MAKE</td><td style="width:60%;">Ford</td></tr>

      <tr><td style="width:40%;">MODEL</td><td style="width:60%;">Fusion</td></tr>

      <tr><td style="width:40%;">TRIM LEVEL</td><td style="width:60%;">V6 SEL</td></tr>

      <tr><td style="width:40%;">ENGINE</td><td style="width:60%;">3.0L V6 DOHC 24V</td></tr>

      <tr><td style="width:40%;">STYLE</td><td style="width:60%;">Sedan 4-Dr</td></tr>

      <tr><td style="width:40%;">MADE IN</td><td style="width:60%;">Mexico</td></tr>

      <tr><td style="width:40%;">STEERING TYPE</td><td style="width:60%;">R&amp;P</td></tr>

      <tr><td style="width:40%;">ANTI BRAKE SYSTEM</td><td style="width:60%;"></td></tr>

      <tr><td style="width:40%;">TANK SIZE</td><td style="width:60%;"></td></tr>

      <tr><td style="width:40%;">OVERALL HEIGHT</td><td style="width:60%;">56.90 Inches</td></tr>

      <tr><td style="width:40%;">OVERALL LENGTH</td><td style="width:60%;">190.60 Inches</td></tr>

      <tr><td style="width:40%;">OVERALL WIDTH</td><td style="width:60%;">72.20 Inches</td></tr>

      <tr><td style="width:40%;">STANDARD SEATING</td><td style="width:60%;">5</td></tr>

      <tr><td style="width:40%;">OPTIONAL SEATING</td><td style="width:60%;"></td></tr>

      <tr><td style="width:40%;">HIGHWAY MILEAGE</td><td style="width:60%;">31 miles/gallon</td></tr>

      <tr><td style="width:40%;">CITY MILEAGE</td><td style="width:60%;">22 miles/gallon</td></tr>

       

      There is only one record per web page.

       

      Any thoughts on how best to script this challenge would be appreciated. I read that someone suggested using Insert from URL and the new cURL and JSON functions in FileMaker Pro 16 Advanced, but I am not sure how those would apply, as the data is not in JSON format. I am using FileMaker Pro 16 Advanced and the data would be extracted on the fly in the field on iOS and/or macOS.

       

      Thank you,

      Steve Stearns

      FileMaker Pro 16 Advanced

       

        1 2 Previous Next