我正在尝试在Java中创建一个正则表达式来匹配特定单词的模式,以找到具有相同模式的其他单词。例如,单词“tooth”具有模式12213,因为't'和'o'都重复。我希望正则表达式匹配其他单词,如“牙齿”。
所以这是我尝试使用反向引用。在此特定示例中,如果第二个字母与第一个字母相同,则应该失败。此外,最后一个字母应与其他所有字母不同。
String regex = "([a-z])([a-z&&[^\1]])\\2\\1([a-z&&[^\1\2]])";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("tooth");
//This works as expected
assertTrue(m.matches());
m.reset("tooto");
//This should return false, but instead returns true
assertFalse(m.matches());
我已经验证过,如果我删除了最后一组,例如以下内容,就可以使用像“toot”这样的示例,所以我知道后面的引用正在发挥作用:
String regex = ([a-z])([a-z&&[^\1]])\\2\\1";
但如果我将最后一组添加回模式的末尾,就好像它不再识别方括号内的反向引用。
我做错了什么,或者这是一个错误?
答案 0 :(得分:4)
试试这个:
(?i)\b(([a-z])(?!\2)([a-z])\3\2(?!\3)[a-z]+)\b
<强>解释强>
(?i) # Match the remainder of the regex with the options: case insensitive (i)
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
( # Match the regular expression below and capture its match into backreference number 2
[a-z] # Match a single character in the range between “a” and “z”
)
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
\2 # Match the same text as most recently matched by capturing group number 2
)
( # Match the regular expression below and capture its match into backreference number 3
[a-z] # Match a single character in the range between “a” and “z”
)
\3 # Match the same text as most recently matched by capturing group number 3
\2 # Match the same text as most recently matched by capturing group number 2
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
\3 # Match the same text as most recently matched by capturing group number 3
)
[a-z] # Match a single character in the range between “a” and “z”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b # Assert position at a word boundary
<强>代码强>
try {
Pattern regex = Pattern.compile("(?i)\\b(([a-z])(?!\\2)([a-z])\\3\\2(?!\\3)[a-z]+)\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
看到它正在播放here。希望这会有所帮助。
答案 1 :(得分:4)
如果你打印你的正则表达式,你就会得到一个错误的线索,你的组中的反向引用实际上被Java转义为产生一些奇怪的字符。因此它无法按预期工作。例如:
m.reset("oooto");
System.out.println(m.matches());
还打印
真
此外,&&
在正则表达式中不起作用,您必须使用lookahead。此表达式适用于上面的示例:
String regex = "([a-z])(?!\\1)([a-z])\\2\\1(?!(\\1|\\2))[a-z]";
表达式(?!\\1)
向前看,表示下一个charachter不是表达式中的第一个,而不会向前移动正则表达式光标。