3 Replies Latest reply on Aug 5, 2013 5:17 AM by mikebeargie

    Insert from URL results different from Web Viewer content?

    calexmac

      I have found that the html data returned by Insert from URL is different from the content of a web viewer (accessed by GetLayoutObjectAttribute and "content").

       

      The differences are small for the web site i am looking at:

      - of ~2k lines of html, the web viewer has ~20 additional lines of data

      - the additional data seems to be associated with code for scripts (tho other script data is included elsewhere).

       

      Of course there is a difference in a clock timestamp because the two script commands are executing about 1 second apart, but that is not the concern.

       

      Any thoughts on why Insert from URL would NOT get/include everything that the web viewer needs to display?

       

      tx

      C

        • 1. Re: Insert from URL results different from Web Viewer content?
          mikebeargie

          Insert From URL is supposed to fully render the content of the target URL into memory before it writes it into your target field. Scraping the web viewer via GetLayoutObjectAttribute though does not wait for all loading to complete.

           

          Based on a page rendering ~2,000 lines of code, I'd say the .01% difference is based on the context differences between the two functions. I'm fairly certain that their base functionality is not the same, and Insert From URL has nothing to do with the webviewer object as far as I know.

           

          If the page you are comparing has dynamic header content that is being rendered based on the requesting client (IE from a CMS, or other framework), then it could explain the difference you are seeing.

           

          I can confirm though I've seen this difference before, and since 12, I've almost exclusively favored Insert From URL for more accurate web scraping results.

          1 of 1 people found this helpful
          • 2. Re: Insert from URL results different from Web Viewer content?
            calexmac

            Mike

             

            I have been careful to check for a complete load before scraping the web viewer with GetLayoutObject.  And indeed the difference I observe is fairly high up in the html stream, ~line 125.

            I suspect you are correct in identifying the context differences of the two functions as the cause.

            It is certainly true that there is dynamic content on the page I am looking at.

             

            Tx for your thoughts.

             

            rgds

            C

            • 3. Re: Insert from URL results different from Web Viewer content?
              mikebeargie

              Most people scrape between the body tags, you might want to run a difference comparison to see if that section's different between the two. Or is there a reason you need to scrape out of the header?

               

              I would compare what I need out of the code result, rather than focus on the differences. Stuff like that can drive you up the wall and have no real impact on what you're trying to do. (we've ALL been there too).