如何选择要抓取的特定文字

时间:2018-12-30 09:34:10

标签: kotlin jsoup

我正在尝试抓取以下HTML,我只想获取Some Header部分而不是additional info

<li class="media"> 
     <div class="media-body"> 
      <a href="url.html"> <h4> Some Header <span class="label label-info"> additional Info </span> </h4> </a> Address info
      <br> 
     </div> </li>`

我正在尝试以下操作:

   val li: Elements = ul.select("li") 
    val list: Elements = li.select("a") 
    val headers: Elements = list.select("h4")

`

然后当我尝试通过headers.text()获取内部文本时,我同时获取了Some Headeradditional Info

我怎么只能刮擦Some Header部分?

1 个答案:

答案 0 :(得分:3)

您即将解决此问题。您可能正在寻找致电ownText

String s = "<li class=\"media\"> \n" +
        "     <div class=\"media-body\"> \n" +
        "      <a href=\"url.html\"> <h4> Some Header <span class=\"label label-info\"> additional Info </span> </h4> </a> Address info\n" +
        "      <br> \n" +
        "     </div> </li>";

        Document document = Jsoup.parse(s);
        Elements element = document.select("li");

        Elements elements = element.select("a");
        System.out.println(elements.select("h4").first().ownText()); ;

输出:

Some Header