Question

当我使用jsoup提取数据时，我遇到了一个位置。像这样的数据：

This is a <strong>strong</strong> number <date>2013</date>

我想获得这样的数据：This is a number

我该怎么做？任何人都可以帮助我吗？

Answer 1

您可以将html解析为Document，选择body - 元素并获取其文字。

示例：

Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>"); String ownText = doc.body().ownText(); String text = doc.body().text(); System.out.println(ownText); System.out.println(text);

<强>输出：

This is a number This is a strong number 2013

Answer 2

这应该回答你的问题：

public String escapeHtml(String source) {
    Document doc = Jsoup.parseBodyFragment(source);
    Elements elements = doc.select("b");
    for (Element element : elements) {
        element.replaceWith(new TextNode(element.toString(),""));
    }
    return Jsoup.clean(doc.body().toString(), new Whitelist().addTags("a").addAttributes("a", "href", "name", "rel", "target"));
}

Jsoup - Howto clean html by escaping not deleting the unwanted html?

Answer 3

Document doc = Jsoup.parse("This is a <strong>strong</strong> number <date>2013</date>");

Spanned HtmlDoc = Html.fromHtml(doc.toString());
String fromHTML = HtmlDoc.toString();

System.out.println(fromHTML);

如何使用jsoup从这个html标签获取文本？

3 个答案: