11 Replies Latest reply on Dec 12, 2013 12:15 PM by PeterMontague

    Custom function to take away html code

    PeterMontague

      Title

      Custom function to take away html code

      Post

           I'm using this custom function from Brian Dunning. I want to take away all html except for "</br>". I see that the function doesn't even have a reference to "</br>". Yet it took away all of my "</br>". 

            

           // Convert HTML text to text preserving bold and italic styles. Only HTML body text will be processed. Multiple spaces are removed
           // Version: 1.06.
           // Parameters: "text" = HTML text to be converted. "returns" = [ "single" ; "double" ] if no empty lines - or one empty line - are allowed.
           // NOTE 1: This version is only for use in FileMaker Pro 10 or higher.
           // NOTE 2: If you need to convert other HTML encodings like "&#230" -> "æ", you may add the needed substitution pairs in the "//Transform" section.
           // NOTE 3: Requires HTMLtoText_deleteTags ( text ) and HTMLtoText_convertStyle ( text )
           // 24oct06 ; Mogens Brun ; FM Integrator ; mogens_brun@mac.com
           // Mail to mogens_brun@mac.com if you want a demo file with the custom function and instructions for using the function in earlier versions than FileMaker 10.
            
           Let (
                 [
           //Setup
                 $Returns = If ( returns = "single" ; 1 ; 2 ) ;
                 $Return1 = Left ( "¶¶" ; $Returns ) ;
                 $Return2 = Left ( "¶¶¶" ; $Returns + 1 ) ;
                 $Return3 = Left ( "¶¶¶¶" ; $Returns + 2 ) ;
                 $Return4 = Left ( "¶¶¶¶¶¶" ; $Returns + 4 ) ;
                 $Return5 = Left ( "¶¶¶¶¶¶¶¶¶¶" ; $Returns + 8 ) ;
           //Isolate body, remove comment lines (may contain characters given reason to unbalanced tags) and script and noscript lines
                 $SearchStart = "<form " ;
                 $SearchEnd = "</form>" ;
                 $Text = HTMLtoText_deleteTags ( text ) ;
                 $SearchStart = "<script" ;
                 $SearchEnd = "</script>" ;
                 $Text = HTMLtoText_deleteTags ( $Text ) ;
                 $SearchStart = "<noscript" ;
                 $SearchEnd = "</noscript>" ;
                 $Text = HTMLtoText_deleteTags ( $Text ) ;
                 $SearchStart = "<option" ;
                 $SearchEnd = "</option>" ;
                 $Text = HTMLtoText_deleteTags ( $Text ) ;
                 $SearchStart = "<input " ;
                 $SearchEnd = " />" ;
                 $Text = HTMLtoText_deleteTags ( $Text ) ;
                 $StartBody = Position ( $Text ; "<body" ; 1 ; 1 ) ;
                 $SearchStart = "¶//" ;
                 $SearchEnd = "¶" ;
                 $Text = HTMLtoText_deleteTags ( Right ( $Text ; Length ( $Text ) - $StartBody + 1 ) ) ;
           //Convert bold style
                 $SearchStart = "<b>" ;
                 $SearchEnd = "</b>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
                 $SearchStart = "<h1" ;
                 $SearchEnd = "</h1>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
                 $SearchStart = "<h2" ;
                 $SearchEnd = "</h2>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
                 $SearchStart = "<h3" ;
                 $SearchEnd = "</h3>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
                 $SearchStart = "<strong>" ;
                 $SearchEnd = "</strong>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
           //Convert italic style
                 $SearchStart = "<i>" ;
                 $SearchEnd = "</i>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
                 $SearchStart = "<em>" ;
                 $SearchEnd = "</em>" ;
                 $Text = HTMLtoText_convertStyle ( $Text ) ;
           //Delete HTML tags and convert special characters
                 $SearchStart = "<" ;
                 $SearchEnd = ">" ;
                 $Text = HTMLtoText_deleteTags ( Substitute ( $Text ; 
           //Blank
              [ "¶" ; "" ] ;
              [ "</li>" ; "" ] ;
              [ "<tr>" ; "¶" ] ;
              [ "</tr>" ; "" ] ;
              [ "</a>" ; "" ] ;
              [ "</td>" ; "" ] ;
              [ "&middot;" ; "" ] ;
              [ "<td>" ; "" ] ; 
              [ "’" ; "'" ] ; 
              [ Char ( 10 ) ; "" ] ;
              [ Char ( 65533 ) ; "" ] ;
              [ Char ( 0 ) ; "" ] ;
           //Space
              [ "</div>" ; " </div>" ] ;
              [ "</option>" ; " </option>" ] ;
              [ "<a href" ; " <a href" ] ;
              [ "</span>" ; " " ] ;
              [ "|" ; " | " ] ;
           // Transform
              [ "<td>" ; Char ( 9 ) ] ;
              [ " " ; " " ] ; //non-breaking space
              [ "<li>" ; "¶• " ] ;
              [ "<li " ; "¶• <li " ] ;
              [ "</p>" ; "¶¶" ] ;
              [ "<p>" ; "¶¶" ] ;
              [ "<p " ; "¶¶<p " ] ;
              [ "<br>" ; "¶" ] ;
              [ "<br />" ; "¶" ] ;
              [ "<tr " ; "¶<tr " ] ;
              [ "<ul>" ; "¶¶" ] ;
              [ "</ul>" ; "¶¶" ] ;
              [ "</table>" ; "¶¶" ] ;
              [ "&amp;" ; "&" ] ;
              [ "&hellip;" ; "…" ] ;
              [ "&quot;" ; "\"" ] ;
              [ "&8220;" ; "\“" ] ;
              [ "&8221;" ; "\”" ] ;
              [ "&#174;" ; "®" ] ;
              [ "&copy;" ; "®" ] ;
              [ "&ldquo;" ; "\“" ] ;
              [ "&rdquo;" ; "\”" ] ;
              [ "&raquo;" ; "»" ] ;
              [ "&laquo;" ; "«" ] ;
              [ "&gt;" ; ">" ] ;
              [ "&lt;" ; "<" ] ;
              [ "&#8217;" ; "'" ] ;
              [ "&#8482;" ; "™" ] ;
              [ "&#194;" ; "Å" ] ;
              [ "&#196;" ; "Ä" ] ;
              [ "&#198;" ; "Æ" ] ;
              [ "&#201;" ; "É" ] ;
              [ "&#214;" ; "Ö" ] ;
              [ "&#216;" ; "Ø" ] ;
              [ "&#220;" ; "Ü" ] ;
              [ "&#228;" ; "ä" ] ;
              [ "&#229;" ; "å" ] ;
              [ "&#230;" ; "æ" ] ;
              [ "&#233;" ; "é" ] ;
              [ "&#246;" ; "ö" ] ;
              [ "&#248;" ; "ø" ] ;
              [ "&#252;" ; "ü" ] ;
           //Clean space & tab & return
              [ " " ; " " ] ;
              [ "         " ; " " ] ;
              [ "     " ; " " ] ;
              [ "   " ; " " ] ;
              [ "  " ; " " ] ;
              [ "> <" ; "" ] ;
              [ "><" ; "" ] ;
              [ " ¶" ; "¶" ] ;
              [ "¶ " ; "¶" ]            ) )
                 ;
                 $SearchStart = "&#" ;
                 $SearchEnd = ";"
                 ] ;
                 Substitute ( "Ÿ" & HTMLtoText_deleteTags ( $Text ) & "Ÿ"  ;
           //Clean successive returns
              [ $Return5 ; $Return1 ] ;
              [ $Return4 ; $Return1 ] ;
              [ $Return3 ; $Return1 ] ;
              [ $Return2 ; $Return1 ] ;
           //Trim for returns at start and end
              [ "Ÿ¶¶" ; "" ] ;
              [ "Ÿ¶" ; "" ] ;
              [ "¶¶Ÿ" ; "" ] ;
              [ "¶Ÿ" ; "" ] ;
              [ "Ÿ" ; "" ]
            )
           )

        • 1. Re: Custom function to take away html code
          schamblee

               I'm not a 100% sure but It look likes <br> is transformed  to a carriage return - ¶, then all carriage returns are remove at the end of the script.  

          • 2. Re: Custom function to take away html code
            philmodjunk

                 Yep that's what it does. If you look in the pairs of values listed in square brackets at the end of the funciton definition, you'll find both the Br and \Br tags. Looks like you can just remove those to get this to do what you need.

            • 3. Re: Custom function to take away html code
              PeterMontague

                   I can see "[ "<br />" ; "¶" ]" but not [ "</br>" ; "¶" ]. If I eliminate"[ "<br />" ; "¶" ]" will this work?

              • 4. Re: Custom function to take away html code
                schamblee

                     I'm not a html expert, but I don't think /br is html code for anything. <br> is a line break in html. <br /> is xhtml and used to close the tag.  That being said  I would think [ "<br>" ; "¶" ] ; would need to be removed.  I believe that the / before br is really being ignored.  I would make a backup of my html before I did any testing.

                      
                • 5. Re: Custom function to take away html code
                  PeterMontague

                       Thanks for the tip. I'll substitute <br> for all of my </br>.

                  • 6. Re: Custom function to take away html code
                    philmodjunk

                         sorry for the typo in my post, but I found this in your post:

                          [ "</p>" ; "¶¶" ] ;

                            [ "<p>" ; "¶¶" ] ;
                            [ "<p " ; "¶¶<p " ] ;
                       [ "<br>" ; "¶" ] ;
                       [ "<br />" ; "¶" ] ;
                            [ "<tr " ; "¶<tr " ] ;
                            [ "<ul>" ; "¶¶" ] ;
                          
                         I assumed the tags in red were the two you were looking for.
                    • 7. Re: Custom function to take away html code
                      PeterMontague

                           I deleted these two tags from my custom function. But my custom function is still getting rid of all of the </br> from all of my text. 

                           I thought </br> was a html line break. Am I wrong?

                      • 8. Re: Custom function to take away html code
                        philmodjunk

                             <br> is an HTML line break

                        • 9. Re: Custom function to take away html code
                          PeterMontague

                               I deleted the references to

                                  [ "<br>" ; "¶" ] ;
                                  [ "<br />" ; "¶" ] ;
                                
                               For some reason this made no difference and it still deletes those characters and replaces them with nothing. I tried deleting lots of parts of the function and it doesn't leave the <br> behind. It might be something to do with a related custom function. There are two related custom functions. Here they are:
                                    // Remove text between $SearchStart and $SearchEnd including $SearchStart and $SearchEnd
                                    // Version: 1.06.
                                    // Parameters: "text" = HTML text to be converted.
                                    // NOTE: Used by HTMLtoText (text;returns) together with HTMLtoText_delete (text)
                                    // 24oct06 ; Mogens Brun ; FM Integrator ; mogens_brun@mac.com
                                    // Mail to mogens_brun@mac.com if you want a demo file with the custom function.
                                    Let (
                                          [
                                          $Start = Position ( text ; $SearchStart ; 1 ; 1 ) ;
                                          $End = Position ( text ; $SearchEnd ; $Start + 1 ; 1 ) + Length ( $SearchEnd ) ;
                                          $Start = If ( $End < $Start ; 0 ; $Start )
                                          ] ;
                                          If (  $Start = 0 ; text ;
                                          HTMLtoText_deleteTags ( Left ( text ; $Start - 1 ) & Middle ( text ; $End ; Length ( text ) - $End + 1 ) ) )
                                    )
                                     
                                    And
                                     
                                         // Convert bold and italic text between $SearchStart and $SearchEnd.
                                         // Version: 1.06.
                                         // Parameters: "text" = HTML text to be converted.
                                         // NOTE: Used by HTMLtoText (text;returns) together with HTMLtoText_delete (text)
                                         // 24oct06 ; Mogens Brun ; FM Integrator ; mogens_brun@mac.com ; 08jun10 ; David Fennell corrected $Return2Post to $ReturnPost
                                         // Mail to mogens_brun@mac.com if you want a demo file with the custom function.
                                          
                                         Let (
                                               [
                                               $Start = Position ( text ; $SearchStart ; 1 ; 1 ) ;
                                               $StartLength = Position ( text ; ">" ; $Start ; 1 ) - $Start + 1 ;
                                               $End = Position ( text ; $SearchEnd ; 1 ; 1 ) ;
                                               $EndLength = Length ( $SearchEnd ) ;
                                               $Start = If ( $End < $Start ; 0 ; $Start ) ;
                                               $ReturnPre = If ( $SearchStart = "<h1" or $SearchStart = "<h2" or $SearchStart = "<h3" ; "<p>" ; "" ) ;
                                               $ReturnPost = If ( $SearchStart = "<h1" or $SearchStart = "<h2" or $SearchStart = "<h3" ; "<br>" ; "" ) ;
                                               $StyleString = $ReturnPre & Middle ( text ; $Start + $StartLength ; $End - $Start - $StartLength ) & $ReturnPost
                                               ] ;
                                               If (  $Start = 0 ; text ;
                                               HTMLtoText_convertStyle ( Left ( text; $Start - 1 ) & TextStyleAdd ( $StyleString ; Bold ) & Middle ( text ; $End + $EndLength ; Length ( text ) - $End - $EndLength + 1 ) ) )
                                         )

                                

                          • 10. Re: Custom function to take away html code
                            ChichoSpit

                                 Hi guys, 

                                 I don't know if your problem is this:

                                 I have some problem parsing text with this function, the result was all the text in one block.

                                 I just edit the custom function and change this:

                             [ Char ( 10 ) ; "" ] ;

                                 for this

                             [ Char ( 10 ) ; "¶" ] ;

                                  

                                 sorry about my english..

                                  

                            • 11. Re: Custom function to take away html code
                              PeterMontague

                                   Thank you Chico. I'll try that out. Your English is fine by the way.