6 Replies Latest reply on Jun 1, 2017 6:46 AM by CP42Kx07

    Incomplete Web Content

    CP42Kx07

      When I use the “Insert from URL” script step in FileMaker Pro Advanced 15 to get the content of a particular web page the resulting data is incomplete.

       

      The issue also exists if I take the more circuitous route of using a web viewer and the GetLayoutObjectAttribute function.

       

      However, if I enter the url in Safari and then view the web page elements via the Develop / Show Web Inspector menu then the content is complete.

       

      The web page (using GOOG as an example stock):

       

      https://finance.yahoo.com/quote/GOOG/history?p=GOOG

       

      The missing html section (cookie-related):

       

      <a class="Fl(end) Mt(3px) Cur(p)" href="https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1493593200&period2=1496185200&interval=1d&events=history&crumb=0npt/GhF5vf" download="GOOG.csv">

      <svg class="Va(m)! Mend(5px) Stk($actionBlue)! Fill($actionBlue)! Cur(p)" width="15" height="15" viewBox="0 0 48 48" data-icon="download" style="fill: rgb(0, 129, 242); stroke: rgb(0, 129, 242); stroke-width: 0px; vertical-align: bottom;">

      <path d="M43.002 43.002h-38c-1.106 0-2.002-.896-2.002-2v-11c0-1.105.896-2 2.002-2 1.103 0 1.998.895 1.998 2v9h34.002v-9c0-1.105.896-2 2-2s2 .895 2 2v11c0 1.103-.896 2-2 2m-19-8L11.57 23.307c-.75-.748-.75-1.965 0-2.715.75-.75 1.965-.75 2.715 0l7.717 7.716V2h4v26.308l7.717-7.716c.75-.75 1.964-.75 2.714 0s.75 1.967 0 2.715L24.002 35.002z">

      </path>

      </svg>

      <span>Download Data</span>

      </a>

       

      n.b. period1=, period2= & crumb= values will vary.

       

      The missing content relates to the information contained within the Download Data link (see sample file). Interestingly, the content can be accessed in the web viewer version by right click copying the link (just as in Safari).

       

      I presume that the reason must be due to differences between Filemaker’s treatment of web pages (via Insert from URL & GetLayoutObjectAttribute) and that of web browsers such as Safari.

       

      Does anyone know of a work-around for this limitation? Perhaps I could try using AppleScript? Any thoughts would be much appreciated.

       

      Thanks

        • 1. Re: Incomplete Web Content
          beverly

          Get a feed from the website managers rather than trying to scrape a page. If you happen on a solution, keep in mind that it can change at any whim of the developers and/or owners.

          Beverly

          • 2. Re: Incomplete Web Content
            CP42Kx07

            Beverly

             

            Thank you for your response. I certainly agree that some form of feed would make sense for a commercial solution but for occasional non-commercial use it is not quite so feasible.

             

            Actually the Yahoo link etc provided was more in the nature of an example to demonstrate an apparent FileMaker limitation in this area. I am still interested in discovering whether there is a more certain way to extract the complete html from a web page (as viewable in a web browser) from within FileMaker. It may be that there is a simple way to do this but I have been unable to find it so I shall now consider trying to use AppleScript. I believe that there may be way using javascript but I have no experience with this (and indeed am not even yet clear whether one can run javascript from within FileMaker).

            • 3. Re: Incomplete Web Content
              beverly

              Part of the "problem" may be the way any page is generated & rendered.

              If you look at the source in the browser, it may not be HTML that can be more easily parsed.

               

              Sent from miPhone

              • 4. Re: Incomplete Web Content
                TomHays

                I think the issue you are experiencing is because the "missing html" is not in the source HTML code.  I think that it is the result of the web browser running JavaScript code.  The missing part is the result of the JavaScript program that ran when the page is rendered by the browser engine.

                 

                Both Insert From URL and scraping the Web Viewer will retrieve the contents of the source HTML file contained at the web link.  They won't run any programs (e.g. JavaScript) or render any HTML.

                 

                In the Web Viewer the action of rendering the HTML and running JavaScript is outsourced to the browser engine provided by the operating system.  That browser engine is then given permission by FileMaker to display the rendered content in the space provided by the Web Viewer on your layout.  FileMaker does not provide any means of accessing the rendered web page directly (e.g. web page scraping) since it can't "see" the content inside the web viewer.  FileMaker only knows what source content it asked the browser engine to render for it.

                 

                When you use the Safari Web Inspector tool and look under the Elements section, you are seeing the HTML after the <scripts> sections have been run and generated more HTML.    The Resources section of the Web Inspector displays the source info that is available by Insert From URL and scraping the Web Viewer.

                 

                Workarounds are not easy since you are asking for the results of the web rendering engine.  By its nature this JavaScript-provided HTML will change to fit the circumstances.  You can expect that it will be different for different rendering engines and even for different user sessions.

                 

                For the situation you provided of accessing the file referenced by the JavaScript-generated download link, I'd pursue Beverly's suggestion about finding an alternate route to the data.

                 

                I don't have a good suggestion for the general case of scraping content generated by <script> sections in the web page.  People write dedicated "web bots" to handle more sophisticated web page interactions like this.

                 

                -Tom

                • 5. Re: Incomplete Web Content
                  beverly

                  Thanks, Tom! More detailed than what I said.

                  Beverly

                  • 6. Re: Incomplete Web Content
                    CP42Kx07

                    Great, Tom. That seems like an excellent explanation of what I (vaguely) suspected might be happening. It still leaves me with a slight hope that I may be able to use an AppleScript to access the content of the link but as I know next to nothing about scripting it may take some time!