4 Replies Latest reply on Sep 11, 2012 12:32 PM by philmodjunk

    How to get a word that is four words before a unique sequence

    PeterMontague

      Title

      How to get a word that is four words before a unique sequence

      Post

           I want to be able to parse out this word from a string of text.

           <li><b>Paperback:</b> 176 pages</li>

           The only unique identifier is "pages</li>"

           I want to parse out the word "paperback" which I hope would be four words before this at all times.

           Any suggestions?

        • 1. Re: How to get a word that is four words before a unique sequence
          MarcMcCall

               You could use a function similar to this.

               If ( PatternCount ( yourtextfield ; "Paperback" ) ; "Paperback" ; "" )

          • 2. Re: How to get a word that is four words before a unique sequence
            philmodjunk

                 I would guess that the word isn't always "Paperback"--might be "Hardcover" and maybe even "ebook" or "kindle" these days.

                 You could patterncount for all of them, but you can also use the middlewords function to extract the exact text used as well.

                 Let ( [ PgPos = Position ( YourText ; "Pages" ; 1 ; 1 ) - 1 ;
                          T = Right ( YourText ; PgPos ) ] ;
                          MiddleWords ( T ; WordCount ( T ) - 2 ; 1 )
                        )

                 Note: I didn't have time to test out this one so the - 2 term may need to be adjusted. I'm not sure how the Html tags will parse into individual words.

            • 3. Re: How to get a word that is four words before a unique sequence
              PeterMontague

                   Thanks PhilModjunk and Marc

                   I've tried PhilModJunk's method first. 

                    

                   Let ( [ PgPos = Position ( this::Child Source Code ; "Pages</li>" ; 1 ; 1 ) - 1 ;
                            T = Right ( this::Child Source Code ; PgPos ) ] ;
                            MiddleWords ( T ; WordCount ( T )  - 3 ; 1 )
                          )
                    
                   I presume that the -4 on the second last line is supposed to move me back four words before "Pages</li>".
                   But its actually finding a piece of text very far away in the document. 
              <li><b>Paperback:</b> 176 pages</li>
              I'm not sure what to do now.
              Marc I think your method would also work too.
              There are about twenty different words that could fit in there.
              Is the following the right way to go?
                   If ( PatternCount ( this::Child Source Code ; "<li><b>Paperback:</b>" ) ; "Paperback" ; "" ) & 
                   If ( PatternCount ( this::Child Source Code ; "<li><b>Hardcover:</b>" ) ; "Hardcover" ; "" )
              • 4. Re: How to get a word that is four words before a unique sequence
                philmodjunk

                     How silly of me, I went right when I should have gone left:

                     Let ( [ PgPos = Position ( YourText ; "Pages</li>" ; 1 ; 1 ) - 1 ;
                              T = Left ( YourText ; PgPos ) ] ;
                              MiddleWords ( T ; WordCount ( T ) - 2 ; 1 )
                            )