Question

我正在编写一个方法，该方法应该用'****'

替换列表中匹配的所有单词

字符。到目前为止，我的代码有效但所有特殊字符都被忽略。

我在我的表达式中尝试使用 “\\ W” ，但看起来我没有使用它，所以我可以使用一些帮助。

这是我到目前为止的代码：

        for(int i = 0; i < badWords.size(); i++) {
        if (StringUtils.containsIgnoreCase(stringToCheck, badWords.get(i))) {
            stringToCheck = stringToCheck.replaceAll("(?i)\\b" + badWords.get(i) + "\\b", "****");
        }
    }

E.g。我有单词列表['bad'，'@ $$']。

如果我有一个字符串："This is bad string with @$$"我希望此方法返回"This is **** string with ****"

请注意，该方法应该注意区分大小写的单词，例如TesT和test应处理相同的内容。

Answer 1

我不确定你为什么使用StringUtils，你可以直接替换与坏词匹配的词。这段代码适合我：

public static void main(String[] args) {
    ArrayList<String> badWords = new ArrayList<String>();
    badWords.add("test");
    badWords.add("BadTest");
    badWords.add("\\$\\$");
    String test = "This is a TeSt and a $$ with Badtest.";
    for(int i = 0; i < badWords.size(); i++) {
            test = test.replaceAll("(?i)" + badWords.get(i), "****");
    }
    test = test.replaceAll("\\w*\\*{4}", "****");
    System.out.println(test);
}

输出：

This is a **** and a **** with ****.

Answer 2

问题是这些特殊字符，例如$是正则表达式控制字符，而不是文字字符。你需要使用两个反斜杠来转义坏词中出现的以下字符：

{}()\[].+*?^$|

Answer 3

我的猜测是，你的坏词列表包含在正则表达式中解释时具有特定含义的特殊字符（这是replaceAll方法的作用）。例如，$通常匹配字符串/行的结尾。所以我推荐一系列的东西：

请勿使用containsIgnoreCase来确定是否需要进行替换。只需让replaceAll每次都运行 - 如果与坏词列表不匹配，则不会对字符串进行任何操作。
正常表达式中具有特殊含义的$等字符在添加到坏词列表中时应进行转义。例如，badwords.add("@\\$\\$");

Answer 4

尝试这样的事情：

    String stringToCheck = "This is b!d string with @$$";
    List<String> badWords = asList("b!d","@$$");
    for(int i = 0; i < badWords.size(); i++) {
        if (StringUtils.containsIgnoreCase(stringToCheck,badWords.get(i))) {
            stringToCheck = stringToCheck.replaceAll("["+badWords.get(i)+"]+","****");
        }
    }
    System.out.println(stringToCheck);

Answer 5

另一个解决方案：与单词边界匹配的错误单词（并且不区分大小写）。

    Pattern badWords = Pattern.compile("\\b(a|b|ĉĉĉ|dddd)\\b",
            Pattern.UNICODE_CASE | Pattern.CASE_INSENSITIVE);
    String text = "adfsa a dfs bb addfdsaf ĉĉĉ adsfs dddd asdfaf a";
    Matcher m = badWords.matcher(text);
    StringBuffer sb = new StringBuffer(text.length());
    while (m.find()) {
        m.appendReplacement(sb, stars(m.group(1)));
    }
    m.appendTail(sb);
    String cleanText = sb.toString();
    System.out.println(text);
    System.out.println(cleanText);
}

private static String stars(String s) {
    return s.replaceAll("(?su).", "*");
    /*
    int cpLength = s.codePointCount(0, s.length());
    final String stars = "******************************";
    return cpLength >= stars.length() ? stars : stars.substring(0, cpLength);
    */
}

然后（在评论中）具有正确计数的星星：一个明星用于Unicode代码点，给出两个代理对（两个UTF-16字符）。

用Java中的字符串替换带有特殊字符的单词

5 个答案: