Question

我正在尝试获取特定div标签的页面（URL）的内容。我设法获得整个标签，但我无法删除它。

Document page_source = Jsoup.connect(current_url).get();
Elements info = page_source.select("div.article#single");

我得到的内容如下：

<div class="article" id="single">    
  <p>qwerty</p>
  <p>qwerty</p> 
</div>

我只想：

  <p>qwerty</p>
  <p>qwerty</p>

我的代码出了什么问题？

Answer 1

直接访问元素对象时，您将获得整个标记。您可以依靠html（）来获取内部内容。例如

Elements info = doc.select("div.article#single");

// The whole element or toString of elements
// Prints : <div class="article" id="single"> 
//      <p>qwerty</p> 
//      <p>qwerty</p> 
//      </div>
System.out.println(info);


// The html content
// Prints : <p>qwerty</p> 
//      <p>qwerty</p>
System.out.println(info.html());


// The actual value of nodes
// Prints : qwerty qwerty
System.out.println(info.text());

使用JSOUP获取父div的内部内容

1 个答案: