5 Replies Latest reply on Sep 10, 2012 10:34 AM by philmodjunk

    How to parse out a piece of text between html tags

    PeterMontague

      Title

      How to parse out a piece of text between html tags

      Post

           I want to parse out the text between the html tags

           <title> Title </title>

           Is there an easy way to do this?

           Peter.

        • 1. Re: How to parse out a piece of text between html tags
          philmodjunk

               Let ( [ T = YourTable::YourTextField ;
                         start = Position ( T ; "<title>" ; 1 ; 1 ) + 7 ;
                         end = Position ( T ; "</title>' ; 1 ; 1 )
                       ] ;
                       Trim ( Middle ( T ; start ; end - start ) )
                      )

          • 2. Re: How to parse out a piece of text between html tags
            PeterMontague

                 Thanks. This got me just about what I wanted. It got me the title and everything else between the tags.

                  

                 Netherland: Amazon.co.uk: Joseph O&#39;Neill: Books

                 So I tried this:

                  

                 Let ( [ T = this::Child Source Code ;
                           start = Position ( T ; "<title>" ; 1 ; 1 ) + 7 ;
                           end = Position ( T ; ": Amazon.co.uk" ; 1 ; 1 )
                         ] ;
                         Trim ( Middle ( T ; start ; end - start ) )
                        )
                 But this got me a blank field.
                  
                 If : Amazon.co.uk is my end point shouldn't that work for me?
            • 3. Re: How to parse out a piece of text between html tags
              philmodjunk

                   But will it be ": Amazon.co.uk" in every instance?

                   From here, it looks like it should work, but if the text in a position function's search parameter doesn't exactly match the text in the text, it returns zero and this will give you a negative number in the Middle function, producing an empty result.

                   It could be as simple as their being 2 spaces between the ":" and the "A" in Child Source Code and only one in your calculation. A non-printing character could also look like a space but actually be something else.

                   You might try just using the colon as your text in the "end" calculation.

                   end = Position ( T ; ":" ; start ; 1 )

              • 4. Re: How to parse out a piece of text between html tags
                PeterMontague

                     It is a very long source code. </title> works as an endpoint. But : does not. Maybe this is because : comes up elsewhere beforehand in the code.

                     Here is a piece of the code. Can you suggest anything else I could use to choose an accurate endpoint that could be used generically?

                      

                <meta name="title" content="Netherland: Amazon.co.uk: Joseph O&#39;Neill: Books" />

                      

                <meta name="keywords" content="Joseph O&#39;Neill,Netherland,Harper Perennial,0007275706,mon0000066755,Fiction and related items / Modern and contemporary fiction (post c. 1945),Modern &amp; contemporary fiction (post c 1945),Fiction / General,General &amp; Literary Fiction" />

                <title>Netherland: Amazon.co.uk: Joseph O&#39;Neill: Books</title>

                • 5. Re: How to parse out a piece of text between html tags
                  philmodjunk

                       That's why I used start as one of the parameters in the position function. It should return the position of the first : to come after <title>.