这是html文件:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>Title</title>
</head>
<body>
<h1>Demo</h1>
<div class="eta">
<h2>Text</h2>
<h2 class="strike">Text1</h2>
<div class="del">
<p>Text2</p>
</div>
<p class="desc">Text3</p>
</div>
</body>
</html>
我想访问class="eta"
的第一个元素Text
。我写了以下代码:
public static void main(String[] args) {
Document doc;
Document doc1;
try {
File input = new File("/path/sample.html");
doc1 = Jsoup.parse(input, "UTF-8");
Elements details2 = doc1.getElementsByClass("eta");
String status2 = details2.first().text();
System.out.println(status2);
} catch (IOException e) {
e.printStackTrace();
}
}
该程序输出以下内容: Text Text1 Text2 Text3
然而,我想只提取文本。我怎么能这样做?
答案 0 :(得分:1)
Elements divs = doc1.select("eta");
Element firstDiv = divs.get(0);
System.out.println(firstDiv.text());