我正在尝试抓取以下HTML,我只想获取Some Header
部分而不是additional info
。
<li class="media">
<div class="media-body">
<a href="url.html"> <h4> Some Header <span class="label label-info"> additional Info </span> </h4> </a> Address info
<br>
</div> </li>`
我正在尝试以下操作:
val li: Elements = ul.select("li")
val list: Elements = li.select("a")
val headers: Elements = list.select("h4")
`
然后当我尝试通过headers.text()
获取内部文本时,我同时获取了Some Header
和additional Info
我怎么只能刮擦Some Header
部分?
答案 0 :(得分:3)
您即将解决此问题。您可能正在寻找致电ownText:
String s = "<li class=\"media\"> \n" +
" <div class=\"media-body\"> \n" +
" <a href=\"url.html\"> <h4> Some Header <span class=\"label label-info\"> additional Info </span> </h4> </a> Address info\n" +
" <br> \n" +
" </div> </li>";
Document document = Jsoup.parse(s);
Elements element = document.select("li");
Elements elements = element.select("a");
System.out.println(elements.select("h4").first().ownText()); ;
输出:
Some Header