7 Replies Latest reply on Apr 23, 2016 3:15 PM by jeffmdm

    Restore UL LI Custom Function - Need Some Help to Finish

    jeffmdm

      I'm using the HTMLtoText custom function from briandunning.com to remove all of the tags from some website product descriptions.  It places carriage returns (CRs) at the end of every line so essentially it ends up with a value list of sentences.  When it removes <li></li>  tags it replaces them with hard coded bullets at the beginning of the sentence.  For my purposes I'd like to restore the <ul></ul> and <li></li> tags to the cleaned text.

       

      So for example the cleaned text might look like this:

      Lorem ipsum dolor sit amet

      Lorem ipsum dolor sit amet

      Lorem ipsum dolor sit amet

      Lorem ipsum dolor sit amet

       

      It's trivial to change the bullets to <li>.  The next step is to change the CR to </li> at the end of a line that begins with <li>, for which I came up with this:

       

      Let ( [ Start = Position ( text ; "<li>" ; 1 ; 1 ) ; End = Position ( text ; "¶" ; Start ; 1 )] ; Replace ( text ; End ; 1 ; "</li>" ) )

       

      which works fine for the first line, resulting in this:

       

      <li>Lorem ipsum dolor sit amet</li><li>Lorem ipsum dolor sit amet

      <li>Lorem ipsum dolor sit amet

      <li>Lorem ipsum dolor sit amet


      My question is how to make this operate for the remaining lines that begin with <li> ?  Note that in actual listings there are usually lines that don't begin with bullet / <li> either before, after or before and after the bulleted section.

       

      I know there's a function that counts how many items are in a value list, so that could be used to generate the "n" for a loop, but I don't know how to create the loop to make this run through each line.  Maybe the right approach is a script, or a recursive function, my head is spinning from reading the manual and looking at other custom functions for something I can follow and make work.

       

      Can someone lend a hand?  Many thanks!

        • 1. Re: Restore UL LI Custom Function - Need Some Help to Finish
          erolst

          jeffmdm wrote:

          So I know there's a function that counts how many items are in a value list, so that could be used to generate the "n" for a loop, but I don't know how to create the loop to make this run through each line.

           

          You could use a recursive function that uses an ever-dwindling list, like

           

          // AddEndTag ( theText ) =

          Let ( [

            line = GetValue ( theText ; 1 ) ;

            rem = MiddleValues ( theText ; 2 ; ValueCount ( theText ) ) ;

            endTag = Case ( Left ( line ; 4 ) = "<li>" ; "</li>" ) ;

            res = line & endTag

            ] ;

            res & Case ( Length ( rem ) ; ¶ & AddEndTag ( rem ) )

          )

           

          If you pass the tag into the function, you can generalize this to

           

          // AddEndTag ( theText ; theTag ) =

          Let ( [

            tagPure = Substitute ( theTag ; [ "<" ; "" ] ; [ ">" ; "" ] ) ;

            line = GetValue ( theText ; 1 ) ;

            rem = MiddleValues ( theText ; 2 ; ValueCount ( theText ) ) ;

            endTag = Case ( Left ( line ; Length ( pureTag ) ) = pureTag ; "</" & pureTag & ">" ) ;

            res = line & endTag

            ] ;

            res & Case ( Length ( rem ) ; ¶ & AddEndTag ( rem ) )

          )

          • 2. Re: Restore UL LI Custom Function - Need Some Help to Finish
            jeffmdm

            erolst, thank you for this, I think maybe it's the answer or most of it.  I'll try it this evening when I get home to my database.  But a few questions:

             

            Is this going through the text line by line or character by character?.  I'm struggling to understand how it works but I think maybe its character by character?  What is it that makes the list dwindle - is it the "2" in rem = MiddleValues ( theText ; 2   ?

             

             

             

             

             

            getvalue Returns the requested value given by valueNumber from listOfValues.

            middlevalues Returns a text result containing the specified numberOfValues in text, starting with startingValue.

            valuecount Returns a count of the total number of values in text.

            • 3. Re: Restore UL LI Custom Function - Need Some Help to Finish
              bigtom

              The custom function is editable, so you can make some concession there with a specific substitution for <ul></ul> and <li></li> before all the tags are removed. This would make them easy to identify later if each was replaced by a unique character string which you replace back later after the other tags are gone. For example "*!?_SxUa?!" or some other string that will never likely be used in in the original text.

               

              You also use a simple substitution before and after the CF.

               

              Substitute( HTMLtoText ( Substitute( text; ["<ul>"; "*!?_SxUa?!*"]; ["</ul>"; "*!?_OdMq?!*"]; ["<li>"; "*!?_YmLq?!*"]; ["</li>"; "*!?_CzAr?!*"]); "single"); ["*!?_SxUa?!*"; "<ul>" ]; ["*!?_OdMq?!*"; "</ul>"]; ["*!?_YmLq?!*"; "<li>"]; ["*!?_CzAr?!*"; "</li>"])

              • 4. Re: Restore UL LI Custom Function - Need Some Help to Finish
                jeffmdm

                Bigtom, thank you, the problem with this is that the ul and li tags in the original text usually have style and other markup inside of them, so they don't lend themselves to simple substitutions.  I tried to look over the HTMLtoText function to have it reduce to simple <ul> and <li> tags and it was over my head to figure out.  Thanks anyway!

                • 5. Re: Restore UL LI Custom Function - Need Some Help to Finish
                  erolst

                  jeffmdm wrote:

                  s this going through the text line by line or character by character?.  I'm struggling to understand how it works but I think maybe its character by character?  What is it that makes the list dwindle - is it the "2" in rem = MiddleValues ( theText ; 2   ?

                   

                  The function works line by line, and yes, the status of the rem variable serves as exit condition (never enter a loop without one …); the first line of the argument (theText) is processed, then the next recursion with everything from line 2 is called; if there is no line 2, it stops.


                  To make it totally clear, one could use more variables, with more explicit names:


                  // AddMissingEndTag ( theText ) =

                  Let ( [

                    theCurrentLine = GetValue ( theText ; 1 ) ;

                    toTheRemainingLines = MiddleValues ( theText ; 2 ; ValueCount ( theText ) - 1 ) ;

                    thereAreMoreLinesToFollow = not IsEmpty ( toTheRemainingLines ) ;

                    currentLineHasOpeningTag = Left ( theCurrentLine ; 4 ) = "<li>" ;

                    possibleEndTag = Case ( currentLineHasOpeningTag ; "</li>" ) ;

                    theProcessedLine = theCurrentLine & possibleEndTag ;

                    addAReturnCharacter = ¶ ;

                    otherwiseHaltTheRecursion = ""

                    ] ;

                    // return as result …

                    theProcessedLine

                    &

                    If ( thereAreMoreLinesToFollow ;

                      addAReturnCharacter & /* recursively apply the function */ AddMissingEndTag ( toTheRemainingLines ) ;

                      otherwiseHaltTheRecursion

                    )

                  )

                   

                  This is a bit over the top , but you get the idea …

                  • 6. Re: Restore UL LI Custom Function - Need Some Help to Finish
                    bigtom

                    Then you are better off using the first suggestion.

                    • 7. Re: Restore UL LI Custom Function - Need Some Help to Finish
                      jeffmdm

                      The function works great, thanks very much for your help!