Question

如何修剪过多的非数字，非字母字符，如下所示：

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"

输出应为：

Hey this is a string with lots of symbols!@#

我目前拥有的是这个，但它有一些奇怪的副作用，而且它太笨重了：

（第一个目标是修剪它们，第二个目标是使它成为2-3个衬垫）

    String precheck = message.replaceAll("[a-zA-Z]", "");

    precheck = precheck.replaceAll("[0-9]+/*\\.*[0-9]*", "");
    precheck = precheck.trim();

    String[] allowed = {
            "!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\\"
    };

    for(char c : precheck.toString().toCharArray())
    {
        boolean contains = false;
        for(String symbol : allowed)
        {
            if(c == symbol.toCharArray()[0]){
                contains = true;
            }
        }

        if(!contains){
            message = message.replace(String.valueOf(c), "");
            message = message.trim();
        }
    }

    for(String symbol : allowed)
    {
        if (message.contains(symbol)){
            int count = 0;

            for (int i = 0; i < message.length(); i++){
                if (message.charAt(i) == symbol.toCharArray()[0]){
                    count++;
                }
            }

            if(count > 2){
                for(int i = 0;i < (count-2);i++){
                    message = message.replaceFirst(symbol, "");
                }
            }
        }
    }

    return message;

Answer 1

您可以使用此正则表达式替换：

str = str.replaceAll("([^\\p{L}\\p{N}])\\1+", "$1");

RegEx Demo

说明：此正则表达式匹配任何非数字，非字母字符，并将其作为匹配的组＃1捕获。然后，正则表达式使用\1+匹配相同捕获字符的1个或多个实例，并将其替换为第一部分，即$1。

PS：此前瞻性正则表达式也可以使用：

str = str.replaceAll("([^\\p{L}\\p{N}])(?=\\1+)", "");

Answer 2

由于您已定义白名单，我建议使用此方法：匹配重复的所有允许符号字符，保留第一个。

([!"'><+_^@#=/\\-])\1+

Java中的

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";

test = test.replaceAll("([!"'><+_^@#=/\\\\-])\\1+", "$1");

结果

"Hey this is a string with lots of symbols!@#"

Java修剪过多的符号

2 个答案:

RegEx Demo