我正在尝试使用.txt文件获取String [],我需要删除所有标点符号,但有一些例外。这是我的代码:
replaceAll("[^a-zA-Z ]", "");
例外: 1.一个词内的连字符。 2.取出包含数字的单词3.取消在结尾和开头包含两个标点符号的单词
答案 0 :(得分:0)
[^ a-zA-Z]是一个字符类。这意味着它只匹配一个字符,在这种情况下将匹配任何不是a-z,A-Z或空格的东西。
如果要匹配单词,则需要使用带有量词的字符类,例如+。如果要匹配不同的模式,则需要应用或逻辑运算符|
。
了解这一点,您现在可以匹配以一个或多个数字结尾的单词或在中间[^a-zA-Z ][0-9]+|[^a-zA-Z ]+[0-9]
中有数字的单词。我会留给你作为练习,将它应用于你的三个人,因为这听起来像是一个学校作业。
答案 1 :(得分:0)
我有非常复杂的正则表达式,但它有效。
\S*\d+\S*|\p{Punct}{2,}\S*|\S*\p{Punct}{2,}|[\p{Punct}&&[^-]]+|(?<![a-z])\-(?![a-z])
说明:
Match this alternative «\S*\d+\S*»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a single character that is a “digit” «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\p{Punct}{2,}\S*»
Match a character from the POSIX character class “punct” «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Or match this alternative «\S*\p{Punct}{2,}»
Match a single character that is NOT a “whitespace character” «\S*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match a character from the POSIX character class “punct” «\p{Punct}{2,}»
Between 2 and unlimited times, as many times as possible, giving back as needed (greedy) «{2,}»
Or match this alternative «[\p{Punct}&&[^-]]+»
Match a single character present in the list below «[\p{Punct}&&[^-]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
A character from the POSIX character class “punct” «\p{Punct}»
Except the literal character “-” «&&[^-]»
Or match this alternative «(?<![a-z])\-(?![a-z])»
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<![a-z])»
Match a single character in the range between “a” and “z” «[a-z]»
Match the character “-” literally «\-»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?![a-z])»
Match a single character in the range between “a” and “z” «[a-z]»
示例:
String text ="a-b ab--- - ---a --- , ++++ ?%# $22 43 4zzv";
String rx = "(?i)\\S*\\d+\\S*|\\p{Punct}{2,}\\S*|\\S*\\p{Punct}{2,}|[\\p{Punct}&&[^-]]+|(?<![a-z])\\-(?![a-z])";
String result = text.replaceAll(rx, " ").trim();
System.out.println(result);
上面的代码将打印:
a-b