Question

我需要查找并替换用户提供的单词列表。我的应用程序在HTML文件中逐行读取，我想验证列表中是否有单词并将其替换为空格。这就是我现在所拥有的，但我想我必须修改我的漏洞代码才能得到我想要的东西。

    private static void PrintFile(File source) throws IOException {
    String s;
    FileReader fr = new FileReader(source);
    @SuppressWarnings("resource")
    BufferedReader br = new BufferedReader(fr);

    @SuppressWarnings("resource")
    PrintWriter pw = new PrintWriter("Results.txt");
    while ((s=br.readLine())!=null) {
        pw.println(s.replaceAll("&#160;", "") //Words to be replaced.
                .replaceAll("<br>", "")
                .replaceAll("&amp;", "")
                .replaceAll("</p>", "")
                .replaceAll("</body>","")
                .replaceAll("</html>", "")
                .replaceAll("<remote object=\"#DEFAULT\">&gt;", ""));
    }
    System.out.println("Done!");
}

我接受任何建议，列表的想法可能不是最好的选择。

Answer 1

您可以使用Jsoup删除HTML标记，就像这样简单：

public static String html2text(String html) {
  return Jsoup.parse(html).text();
}

另请查看Cleaner and Whitelist单独对文档进行整理。

Answer 2

由于String.replaceAll(String regex, String replacement)将regexp作为其第一个参数，我建议使用String.replace(CharSequence, CharSequence replacement)来避免不良行为。

除此之外，我无法在您的代码中看到一个大问题。

Answer 3

如果您不介意在项目中加入apache commons lang，可以使用StringUtils.replaceEach并完成它。

将列表中的单词与java中的stringarray匹配

3 个答案: