Jsoup.clean()离开unclosed并打开标签

时间:2015-12-11 07:29:24

标签: java html xml jsoup

以下代码将<br />替换为<br>

String removeDisallowedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(new String[] { "b", "br", "font" });

    String safe = Jsoup.clean(textToEscape, whitelist);
    return safe;
}

为什么?

1 个答案:

答案 0 :(得分:4)

Jsoup.clean()默认情况下将文档处理为HTML,并且在HTML <br>中不允许使用结束标记。 <img>也是如此。

您必须将代码解析为XML。这将使标签关闭 - 它甚至会为你关闭它们。一种固定的方法,带有一些额外的设置:

String cleanXmlAndRemoveUnwantedTags(String textToEscape) {
    Whitelist whitelist = Whitelist.none();
    whitelist.addTags(allowedTags);

    OutputSettings outputSettings = new OutputSettings()
                    .syntax(OutputSettings.Syntax.xml)
                    .charset(StandardCharsets.UTF_8)
                    .prettyPrint(false);

    String safe = Jsoup.clean(textToEscape, "", whitelist, outputSettings);
    return safe;
}