正则表达式:匹配仅包含非重复单词的字符串

时间:2010-05-21 02:59:47

标签: java regex

我有这种情况(Java代码): 1)字符串如:“狂野冒险”应匹配。 2)带有相邻重复单词的字符串:“野性狂野冒险”不应该匹配。

使用这个正则表达式:。* \ b(\ w +)\ b \ s * \ 1 \ b。*我可以匹配包含相邻重复单词的字符串。

如何扭转这种情况,即如何匹配不包含相邻重复词的字符串

1 个答案:

答案 0 :(得分:6)

使用否定先行断言(?!pattern)

    String[] tests = {
        "A wild adventure",      // true
        "A wild wild adventure"  // false
    };
    for (String test : tests) {
        System.out.println(test.matches("(?!.*\\b(\\w+)\\s\\1\\b).*"));
    }

Rick Measham's explain.pl提供的解释:

REGEX: (?!.*\b(\w+)\s\1\b).*
NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      \w+                      word characters (a-z, A-Z, 0-9, _) (1
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))

另见

相关问题


注意

负面断言只有在你想要积极匹配的其他模式时才有意义(参见上面的例子)。否则,您可以使用布尔补码运算符!来否定matches之前使用的任何模式。

String[] tests = {
    "A wild adventure",      // true
    "A wild wild adventure"  // false
};
for (String test : tests) {
    System.out.println(!test.matches(".*\\b(\\w+)\\s\\1\\b.*"));
}