Jsoup:用new line =>替换标签新线从空间开始

时间:2012-04-16 10:44:18

标签: java html jsoup

我必须用新行替换多个HTML标记,例如您在示例代码中看到的p标记:

String html = "<p>Zeile1</p><p>Zeile2</p><p>Zeile3</p><p>Zeile4</p>";
Document doc = Jsoup.parse(html);
doc.select("p").append("\\n");
String sanitized = doc.text().replaceAll("\\\\n", System.getProperty("line.separator");
System.out.println(sanitized);

这是输出:

Zeile1
 Zeile2
 Zeile3
 Zeile4

正如你所看到的,我在第2-4行得到了空格。 它们来自哪里,如何摆脱它们?

1 个答案:

答案 0 :(得分:3)

与@bdares建议一样,您可以遍历元素:

String html = "<p>Zeile1</p><p>Zeile2</p><p>Zeile3</p><p>Zeile4</p>";
Document doc = Jsoup.parse(html);
StringBuilder b = new StringBuilder();
for (Element p : doc.select("p")) {
    b.append(p.text());
    b.append(System.getProperty("line.separator"));
}
System.out.println(b.toString());

输出:

Zeile1
Zeile2
Zeile3
Zeile4