大家好我是新来的,使用jsoup lib
从网页上读取html这是我的基本html:
<ul id="nav" class="sf-menu">
<li class="level0 nav-3 level-top parent">
<a href="mylink.html"
class="level-top"><span>ABCD_MAIN CAT</span></a>
<ul class="level0">
<li id="level1nav-3-1first"><a class="arrow"
href="mylink.html">SUB CAT
</a>
<ul>
<li><span><a
href="mylink.html">SUB TO SUB CAT1
</span></a></li>
<li><span><a
href="mylink.html">
SUB TO SUB CAT2</span></a></li>
</ul>
</li>
<li class="level1 nav-3-1 first"><a href="mylink.html">
<span>SUB CAT(HERE NO SUB TO SUB CAT)</span></a>
</li>
<li><a href="mylink.html" class="see-all"><span>SUB CAT(HERE NO SUB TO SUB CAT)</span></a>
</li>
</ul>
</li>
</ul>
在此我需要阅读所有猫(类别)它的链接子猫及其相关链接和子链接到子猫链接..
我该怎么做?
请帮忙
提前致谢...
答案 0 :(得分:0)
您可以按如下方式解析:
String webpageContent = <your html page>;
Document doc = Jsoup.parseBodyFragment(webpageContent);
Elements liTags = doc.select("li"); //this will select all li tags
for (Element litag : liTags ) {
// parse each litag to get your desire content
you can use litag.attr, litag.html() , outerHtml()
}
引用this link以了解元素类的其他属性