Java修剪过多的符号

时间:2015-01-11 06:41:18

标签: java regex

如何修剪过多的非数字,非字母字符,如下所示:

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"

输出应为:

Hey this is a string with lots of symbols!@#

我目前拥有的是这个,但它有一些奇怪的副作用,而且它太笨重了:

(第一个目标是修剪它们,第二个目标是使它成为2-3个衬垫)

    String precheck = message.replaceAll("[a-zA-Z]", "");

    precheck = precheck.replaceAll("[0-9]+/*\\.*[0-9]*", "");
    precheck = precheck.trim();

    String[] allowed = {
            "!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\\"
    };

    for(char c : precheck.toString().toCharArray())
    {
        boolean contains = false;
        for(String symbol : allowed)
        {
            if(c == symbol.toCharArray()[0]){
                contains = true;
            }
        }

        if(!contains){
            message = message.replace(String.valueOf(c), "");
            message = message.trim();
        }
    }

    for(String symbol : allowed)
    {
        if (message.contains(symbol)){
            int count = 0;

            for (int i = 0; i < message.length(); i++){
                if (message.charAt(i) == symbol.toCharArray()[0]){
                    count++;
                }
            }

            if(count > 2){
                for(int i = 0;i < (count-2);i++){
                    message = message.replaceFirst(symbol, "");
                }
            }
        }
    }

    return message;

2 个答案:

答案 0 :(得分:1)

您可以使用此正则表达式替换:

str = str.replaceAll("([^\\p{L}\\p{N}])\\1+", "$1");

RegEx Demo

说明:此正则表达式匹配任何非数字,非字母字符,并将其作为匹配的组#1捕获。然后,正则表达式使用\1+匹配相同捕获字符的1个或多个实例,并将其替换为第一部分,即$1

PS:此前瞻性正则表达式也可以使用:

str = str.replaceAll("([^\\p{L}\\p{N}])(?=\\1+)", "");

答案 1 :(得分:0)

由于您已定义白名单,我建议使用此方法:匹配重复的所有允许符号字符,保留第一个。

([!"'><+_^@#=/\\-])\1+
Java中的

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";

test = test.replaceAll("([!"'><+_^@#=/\\\\-])\\1+", "$1");

结果

"Hey this is a string with lots of symbols!@#"