1 2 Previous Next 18 Replies Latest reply on Jan 22, 2013 1:36 AM by PeterMontague

    How to parse out categories from a long sequence of html code

    PeterMontague

      Title

      How to parse out categories from a long sequence of html code

      Post

           I want to parse out the categories as shown below.

           The categories start with "">Books</a> > <a href="...."

           I have a script which can clean up the html tags after I parse this piece out.

           Any ideas?

            

        <h2>Look for similar items by category</h2>

        

        <div class="content">

        <ul>

         <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Art-Architecture-Photography-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=91">Art, Architecture & Photography</a></li>

         <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Digital-Photography-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=13731411">Digital Photography</a></li>

         <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/New-Computing-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=403958">New to Computing</a> > <a href="/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=3443301">Digital Music, Photography & Video</a> > <a href="/Digital-Photography-Music-Video-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=3441991">Digital Photography</a></li>

         <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Software-Graphics-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269870">Software & Graphics</a> > <a href="/Graphics-Multimedia-Software-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269938">Graphics & Multimedia</a> > <a href="/Image-Manipulation-Creation-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=404150">Image Manipulation & Creation</a> > <a href="/Digital-Photography-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269951">Digital Photography</a></li>

         <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Software-Graphics-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269870">Software & Graphics</a> > <a href="/Graphics-Multimedia-Software-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269938">Graphics & Multimedia</a> > <a href="/Image-Manipulation-Creation-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=404150">Image Manipulation & Creation</a> > <a href="/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269963">Scanning</a></li>

        </ul>

        </div>

      </div>

        • 1. Re: How to parse out categories from a long sequence of html code
          davidanders

               You failed to mention version of Filemaker and Operating System...

          http://www.soliantconsulting.com/blog/2011/12/how-to-parse-json-or-any-other-language-in-filemaker
          How to Parse JSON (or any other language) in FileMaker

               This will work if you have FMP Advanced v7thru12

               There is Custom Function here.   It requires installing the Custom Function
          http://www.briandunning.com/cf/559
          ParseData ( theText ; theStartTag ; theEndTag ; theOccurance)

               This is specific to Macs
          http://macscripter.net/viewtopic.php?id=38432
               Parsing website html using applescript

          • 2. Re: How to parse out categories from a long sequence of html code
            philmodjunk

                 Would the first two categories be these?

            Art-Architecture-Photography-Books
            Computers-Internet-Books

            Would these categories by parsed into separate records of a related table?

            Are the number of "categories" fixed or likely to differ each time?

            This looks like a case for looping script where variables keep getting updated with each loop such that they can be used as the start value in the position function to extract the next category into a newly created related record.

            • 3. Re: How to parse out categories from a long sequence of html code
              PeterMontague

              The first category is Books > Art-Architecture-Photography-Books

              The next category is Books > Computers > internet books

              The number of categories differs each time.

              • 4. Re: How to parse out categories from a long sequence of html code
                PeterMontague

                     I'm using Filemaker Pro 11 on a Mac OSX platform.

                     I also have Filemaker Pro advanced on a PC. If I put the custom function into my database on advanced will it then work on Filemaker Pro?

                • 5. Re: How to parse out categories from a long sequence of html code
                  philmodjunk

                       You need Advanced to create a function, but once created and installed in a file, it can be used by FileMaker Pro as well.

                       Next question:

                       How will you store this variable list of information once you have extracted it?

                       Hopefully each one in a separate record? (That's the best way for managing a variable list of information in almost all cases.)

                  • 6. Re: How to parse out categories from a long sequence of html code
                    PeterMontague

                         I want my result to be a carriage return separated list of categories.

                         Then I will be able to parse each list of categories into fields called category 1, category 2 etc.

                          

                    • 7. Re: How to parse out categories from a long sequence of html code
                      PeterMontague

                           I adapted the custom function which you mentioned. Would you mind checking it to see I've done so correctly. Once I've added the custom function where can I find it. Should this be a function which I can add to a script step?

                           // ParseData ( theText; theStartTag; theEndTag; theOccurance)
                           //
                           // Extract the text between two strings.
                           //
                           // Parameters:
                           // theText = the text to parse
                           // theStartTag = the string that comes before the text to extract
                           // theEndTag = the string that comes after the text to extract
                           // theOccurance = the instance of the text to extract
                           //
                           // Return Value:
                           // the instance of text found in theText beween theStartTag and theEndTag based on theOccurance
                           //

                           Let ( [

                           theStartPos = Position ( this::Child Source Code ; "<h2>Look for similar items by category</h2>" ; 1 ; 1 ) ;
                           theResult = Case (


                           // ------------------------------
                           // If theStartTag was not found, return an empty string.
                           theStartPos = 0 ; "" ;
                           // ------------------------------


                           // ------------------------------
                           // If theStartTag was found, get the string we are looking for.
                           theStartPos > 0 ;
                           Let ( [
                           theStartPos = theStartPos + Length ( "<h2>Look for similar items by category</h2>" ) ;
                           theEndPos = Position ( this::Child Source Code ; "</a></li>
                             </ul>
                             </div>
                           </div>"
                            ; theStartPos ; 1 ) ;
                           theLengthToKeep = theEndPos - theStartPos;
                           theResult = Middle ( this::Child Source Code ; theStartPos ; theLengthToKeep )
                           ] ;
                           theResult
                           )
                           // ------------------------------


                           ) // End case

                           ] ;

                           theResult

                           )

                      • 8. Re: How to parse out categories from a long sequence of html code
                        philmodjunk
                             

                                  I want my result to be a carriage return separated list of categories.

                             

                                  Then I will be able to parse each list of categories into fields called category 1, category 2 etc.

                             This does not look like the optimum way to store this information. For one thing, the number of category fields that you define will limit the number categories that you can successfully parse from this text. If you populate a table of related records with this information, you can store any number of categories without needing a dedicated field for each. It is also (form many common sort/search/count/reporting tasks) much easier to work with such a list of data when it is stored in such a table of related records.

                        • 9. Re: How to parse out categories from a long sequence of html code
                          PeterMontague

                               I only want to choose the first three categories out of the list of categories. If there are more I don't mind. Plus making a list of related records sounds complicated. smiley

                               I have a set field script which fills the field with the result of this function. Unfortunately its coming up with a blank. Any ideas what is wrong with the coding of the custom function?

                          • 10. Re: How to parse out categories from a long sequence of html code
                            philmodjunk

                                 It isn't really any more complicated that building your return separated list and is more flexible to work with in a number of different ways. (and your return separated list is very easy to produce from the related table should it be needed.)

                                 For the custom function, have you posted exactly what you find in the function editor?

                                 I'm asking that because this expression:

                                 theStartPos = Position ( this::Child Source Code ; "<h2>Look for similar items by category</h2>" ; 1 ; 1 )

                                 Should be:

                                 theStartPos = Position ( theText ; theStartTag ; 1 ; 1 )

                                 and there's a similar line further down with the same issue.

                            • 11. Re: How to parse out categories from a long sequence of html code
                              PeterMontague

                                   Hi Phil,

                                   You mentioned earlier on that I should store the categories in separate records. How should I go about doing this?

                                   Also you mentioned earlier that I should have:

                                   

                              theStartPos = Position ( theText ; theStartTag ; 1 ; 1 )

                                   instead of 

                                    

                                   theStartPos = Position ( this::Child Source Code ; "<h2>Look for similar items by category</h2>" ; 1 ; 1 )

                                   Aren't they the same as each other, with one being generic?
                                    
                              • 12. Re: How to parse out categories from a long sequence of html code
                                philmodjunk

                                     A custom function is normally intended to work with a variety of inputs passed to it as parameters. They are the same only if you never want to use anything but this::Child Source Code and "<h2>Look for similar items by category</h2>" with your function and if that's the case, you don't need any parameters. The advantage to the parameters is that if you find you need to modify the txt being searched or the text used in the search, you only need modify the values passed to the function as parameters instead of redefining the custom function and it also allows you to use the same function with more than one field being parsed and more than one search text.

                                • 13. Re: How to parse out categories from a long sequence of html code
                                  PeterMontague

                                       Do I need to restructure my relationships and set up a new table to store the categories in different records the way you suggested?

                                       Here is how my database, as you may remember, looks now.

                                  • 14. Re: How to parse out categories from a long sequence of html code
                                    philmodjunk

                                         Did you upload a PDF instead of a gif, jpeg or png file?

                                         You'll need to define a relationship linking this table to your inventory table--presumably by product_id (ISBN number).

                                         A script, to create the category record might look like this:

                                         Set variable [$ISBN ; value: Table::product_id ]
                                         Set variable [$Category ; //put your calc for the category here]
                                         Go to Layout [Category]
                                         New Record/Request
                                         Set Field [Category::Product_ID ; $ISBN]
                                         Set Field [Category::Category ; $category]
                                         Go to Layout [original layout]

                                    1 2 Previous Next