我试图抓住几种不同的字符串,每次修改都让我前进一步,后退一步。我需要任何带有多个char +空格+ char或char + non-alphanumeric + char模式的字符串。基本上,这些实例会被抓住:
w o r d
w.o.r.d
w_o_r_d
w%o%r%d
但是词w,word.w或w.word不应该被抓住。
我尝试了各种正则表达式模式:
(?:\S+\s){2}([a-zA-Z][^a-zA-Z0-9])+[a-zA-Z]+
[$-:-?{-~!"^_`\[\]]
([a-zA-z][$-:-?{-~!"^_`\[\]^]{1})
他们都在那里找到我最好的方式。任何帮助将不胜感激。
答案 0 :(得分:1)
我尝试了以下正则表达式:
((\w[^a-zA-Z\d])+\w[^a-zA-Z\d]?)
请参阅此处的说明: https://regex101.com/r/pM1dV0/6
答案 1 :(得分:1)
在这个问题的当前状态下,这似乎解决了它:
String input =
"fo.o w o r d bar, " +
"fo-o w.o.r.d bar, " +
"f-oo w_o_r_d bar, " +
"fo_o w%o%r%d bar, " +
"f.o.o b-a.r";
Pattern p = Pattern.compile(
"(?<=\\s|^)[a-z0-9]" +//start of the token
"("+
"(?<=\\s[a-z0-9])\\s[a-z0-9]" + //is continuation of `a b...`
"(?=\\s|$)" + //and is not start of token like `x.y.z`
"|"+
"[^a-z0-9 ][a-z0-9]"+ //spaces are special, and are handled earlier
"(?![a-z])" + //is not start of `aaa`
")+"//second part like _b must appear at least once
,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group());
}
输出:
w o r d
w.o.r.d
w_o_r_d
w%o%r%d
f.o.o
b-a.r
(请注意,f.o.o
和b-a.r
作为单独的令牌处理)
答案 2 :(得分:-1)