Question

我需要解析html字符串。我有这样的文字

<html>  <head>      </head>  <body>    <p style="margin-top: 0">      blbibibluboiubiubiu ibiub    </p>  </body></html>

我删除了＆＃39; \ n＆＃39;字符。现在我需要删除＆＃39; \ t＆＃39;字符。我试图像这样做

String s = editor.getText();
s = s.replaceAll("\\n", "");
s = s.replaceAll("\\t", "");

但它不起作用。请帮忙

Answer 1

如果您要解析HTML，我建议您查看Jsoup或类似的框架。 Jsoup可以安全地为您删除换行符和标签。

示例：

String html = "<html><head><title>First parse</title></head>"
    + "<body><p>Parsed HTML into a doc.</p></body></html>";

// Now you can use the document to read your elements
Document doc = Jsoup.parse(html);

Jsoup Cookbook显示了大量示例。

您还可以使用Jsoup来sanitize您的数据，即删除不需要的属性和标签。

Jsoup.clean("<p>Some text</p>", Whitelist.none()); // -> "Some text"

易于设置，推荐！

从字符串中删除'\ t'和'\ n'

1 个答案: