AnsweredAssumed Answered

Scrapping And Parsing Website Data

Question asked by stevestearns on Jun 13, 2017
Latest reply on Jun 13, 2017 by sam_oda

I am trying to figure out the best method of scrapping (Insert from URL) and parsing a website's data contained in a table (there are multiple tables in the web page (see attachment)) to parse out the required data (i.e., Year, Make, Model, Trim Level, Style, Overall Height, Overall Length, and Overall Width).

 

I know I can use the Position and Middle functions to parse the data, but I am struggling with how to find the text and extract only what I need. For instance find to ">YEAR</td>" and then find the next </td> occurrence and grab what's between the ">" and the "</td>" equals "2010" in this example. The source code is in a flattened state, rather than with (carriage) returns after each </tr> as shown below.

 

A snippet of the HTML source code is:

 

<tr><td style="width:40%;">VIN</td><td style="width:60%;">3FAHP0JG6AR413610</td></tr>

<tr><td style="width:40%;">YEAR</td><td style="width:60%;">2010</td></tr>

<tr><td style="width:40%;">MAKE</td><td style="width:60%;">Ford</td></tr>

<tr><td style="width:40%;">MODEL</td><td style="width:60%;">Fusion</td></tr>

<tr><td style="width:40%;">TRIM LEVEL</td><td style="width:60%;">V6 SEL</td></tr>

<tr><td style="width:40%;">ENGINE</td><td style="width:60%;">3.0L V6 DOHC 24V</td></tr>

<tr><td style="width:40%;">STYLE</td><td style="width:60%;">Sedan 4-Dr</td></tr>

<tr><td style="width:40%;">MADE IN</td><td style="width:60%;">Mexico</td></tr>

<tr><td style="width:40%;">STEERING TYPE</td><td style="width:60%;">R&amp;P</td></tr>

<tr><td style="width:40%;">ANTI BRAKE SYSTEM</td><td style="width:60%;"></td></tr>

<tr><td style="width:40%;">TANK SIZE</td><td style="width:60%;"></td></tr>

<tr><td style="width:40%;">OVERALL HEIGHT</td><td style="width:60%;">56.90 Inches</td></tr>

<tr><td style="width:40%;">OVERALL LENGTH</td><td style="width:60%;">190.60 Inches</td></tr>

<tr><td style="width:40%;">OVERALL WIDTH</td><td style="width:60%;">72.20 Inches</td></tr>

<tr><td style="width:40%;">STANDARD SEATING</td><td style="width:60%;">5</td></tr>

<tr><td style="width:40%;">OPTIONAL SEATING</td><td style="width:60%;"></td></tr>

<tr><td style="width:40%;">HIGHWAY MILEAGE</td><td style="width:60%;">31 miles/gallon</td></tr>

<tr><td style="width:40%;">CITY MILEAGE</td><td style="width:60%;">22 miles/gallon</td></tr>

 

There is only one record per web page.

 

Any thoughts on how best to script this challenge would be appreciated. I read that someone suggested using Insert from URL and the new cURL and JSON functions in FileMaker Pro 16 Advanced, but I am not sure how those would apply, as the data is not in JSON format. I am using FileMaker Pro 16 Advanced and the data would be extracted on the fly in the field on iOS and/or macOS.

 

Thank you,

Steve Stearns

FileMaker Pro 16 Advanced

 

Outcomes