如何修剪过多的非数字,非字母字符,如下所示:
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"
输出应为:
Hey this is a string with lots of symbols!@#
我目前拥有的是这个,但它有一些奇怪的副作用,而且它太笨重了:
(第一个目标是修剪它们,第二个目标是使它成为2-3个衬垫)
String precheck = message.replaceAll("[a-zA-Z]", "");
precheck = precheck.replaceAll("[0-9]+/*\\.*[0-9]*", "");
precheck = precheck.trim();
String[] allowed = {
"!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\\"
};
for(char c : precheck.toString().toCharArray())
{
boolean contains = false;
for(String symbol : allowed)
{
if(c == symbol.toCharArray()[0]){
contains = true;
}
}
if(!contains){
message = message.replace(String.valueOf(c), "");
message = message.trim();
}
}
for(String symbol : allowed)
{
if (message.contains(symbol)){
int count = 0;
for (int i = 0; i < message.length(); i++){
if (message.charAt(i) == symbol.toCharArray()[0]){
count++;
}
}
if(count > 2){
for(int i = 0;i < (count-2);i++){
message = message.replaceFirst(symbol, "");
}
}
}
}
return message;
答案 0 :(得分:1)
您可以使用此正则表达式替换:
str = str.replaceAll("([^\\p{L}\\p{N}])\\1+", "$1");
说明:此正则表达式匹配任何非数字,非字母字符,并将其作为匹配的组#1捕获。然后,正则表达式使用\1+
匹配相同捕获字符的1个或多个实例,并将其替换为第一部分,即$1
。
PS:此前瞻性正则表达式也可以使用:
str = str.replaceAll("([^\\p{L}\\p{N}])(?=\\1+)", "");
答案 1 :(得分:0)
由于您已定义白名单,我建议使用此方法:匹配重复的所有允许符号字符,保留第一个。
([!"'><+_^@#=/\\-])\1+
Java中的
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";
test = test.replaceAll("([!"'><+_^@#=/\\\\-])\\1+", "$1");
结果
"Hey this is a string with lots of symbols!@#"