使用Jsoup非递归提取Text

时间:2019-12-01 15:48:22

标签: java html jsoup

这是我要运行的代码:

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";

Document doc = Jsoup.parse(html); //connect  to the page
Element element = doc.getAllElements().first(); //recive the names elements

System.out.println(element.text()); //prints "ZOLA (1)"
System.out.println(element.ownText()); // prints nothing

我的目标是仅提取“ ZOLA”,而不提取子节点的文本,但是ownText不打印任何内容... 我该怎么办?

2 个答案:

答案 0 :(得分:1)

您可以使用此:

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Element elementA =  doc.selectFirst("a");
System.out.println(elementA.ownText()); // ZOLA

答案 1 :(得分:1)

问题是doc.getAllElements().first()返回

<html>
 <head></head>
 <body>
  <a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
 </body>
</html>

您期待的时间

<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>

以下应为您工作:

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";

Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
System.out.println(links.get(0));
System.out.println(links.get(0).ownText());

输出:

<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
ZOLA