为什么将HTML转换为文本时会出现额外的换行符?

时间:2019-03-13 23:27:13

标签: java html jsoup

我正在使用Jsoup将HTML字符串格式化为纯文本。我仍然想保留换行符,并忽略HTML标记。但是,在进行转换时,我得到了多余的空行,并且字符串掉了下来。

        With .Range("C" & i)
            .Value = CDbl(Nz(rs1!PreviousPrice, 0))
            .NumberFormat = "#,##0.0000 $"
        End With

输出:

String htmlString = "<p>Hello this is a description. </p><p>I know Just checking how it looks.</p><p></p><p><code>Add a line.</code></p><p>This is a notmal line <span style="color:#F9931A">Adding orange</span></p><ul><li><p>one </p></li><li><p>two</p></li></ul>";
HtmlToPlainText convert = new HtmlToPlainText();
Document html = Jsoup.parse(htmlString,"", Parser.xmlParser());
String new = convert.getPlainText(html);
System.out.println("This is the description: " + new);

0 个答案:

没有答案