Question

我在Jsoup项目中有这个有线的senario

这是HTML的样子：

<html>
..

<link> example.com </link>
..

</html>

当我尝试使用Jsoup获取文字时

System.out.println(document.select("link").text()) ;//nothing gets printed. (it should print **example.com**)

但如果我将html更改为：

<html>
..

<someOtherTage> example.com </someOtherTage>
..

</html>

然后：

System.out.println(document.select("someOtherTage").text()); //prints **example.com**

所以我的问题：

这是Jsoup中的错误，还是有关于标记名称的特别之处＆＃34; link＆＃34;？

注意： Jsoup Version使用1.6和1.9。 Java 7和8

Answer 1

由于link元素是empty element，因此JSoup已对元素进行了清理并将其内容移动到正文中。（通过打印文档进行验证）。

要将内容保留在link元素中，请切换到XML解析模式：

Document doc = Jsoup.parse(html, "", Parser.xmlParser());