AnsweredAssumed Answered

How to parse out categories from a long sequence of html code

Question asked by PeterMontague on Sep 12, 2012
Latest reply on Jan 21, 2013 by PeterMontague

Title

How to parse out categories from a long sequence of html code

Post

     I want to parse out the categories as shown below.

     The categories start with "">Books</a> > <a href="...."

     I have a script which can clean up the html tags after I parse this piece out.

     Any ideas?

      

  <h2>Look for similar items by category</h2>

  

  <div class="content">

  <ul>

   <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Art-Architecture-Photography-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=91">Art, Architecture & Photography</a></li>

   <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Digital-Photography-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=13731411">Digital Photography</a></li>

   <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/New-Computing-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=403958">New to Computing</a> > <a href="/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=3443301">Digital Music, Photography & Video</a> > <a href="/Digital-Photography-Music-Video-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=3441991">Digital Photography</a></li>

   <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Software-Graphics-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269870">Software & Graphics</a> > <a href="/Graphics-Multimedia-Software-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269938">Graphics & Multimedia</a> > <a href="/Image-Manipulation-Creation-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=404150">Image Manipulation & Creation</a> > <a href="/Digital-Photography-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269951">Digital Photography</a></li>

   <li><a href="/books-used-books-textbooks/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=266239">Books</a> > <a href="/Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=71">Computing & Internet</a> > <a href="/Software-Graphics-Computers-Internet-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269870">Software & Graphics</a> > <a href="/Graphics-Multimedia-Software-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269938">Graphics & Multimedia</a> > <a href="/Image-Manipulation-Creation-Graphics-Books/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=404150">Image Manipulation & Creation</a> > <a href="/b/ref=dp_brlad_entry/275-7979289-3498506?ie=UTF8&amp;node=269963">Scanning</a></li>

  </ul>

  </div>

</div>

Outcomes