RegEx匹配最后一个大写单词和String中另一个单词之间的所有字符

时间:2014-01-17 08:56:30

标签: java regex regex-lookarounds

我需要匹配String中最后一个大写单词和另一个单词之间的所有字符。输入文字:在夜晚的大洞和(洞2)墙上跳过的CLEVER狐狸。

使用RegEx:  

(?<=\b[A-Z]+\s)(.+?)(?=\sin)

以上正则表达式为 fox JUMPED OVER the big and (Hole 2) wall

预期输出: the big and (Hole 2) wall

任何人都可以破解这个吗?

4 个答案:

答案 0 :(得分:2)

这可能不是最有效的解决方案,但似乎有效:

String text = "The CLEVER fox JUMPED OVER the big wall in the night.";
String regex = "(\\b[A-Z]+\\s)(?!.*\\b[A-Z]+\\b)(.+?)(\\sin)";
Matcher m = Pattern.compile(regex).matcher(text);
if (m.find()) {
    System.out.println(m.group(2));
}

它使用负向前瞻以确保在捕获所需数据之前文本中不再有大写单词。

答案 1 :(得分:1)

您可以在第二个匹配表达式

中简单地排除大写字符

(?<=\b[A-Z]+\s)([^A-Z]+)(?=\sin)

这将强制第一部分与The CLEVER fox JUMPED OVER匹配,第二个匹配表达式将产生the big wall,最后一个匹配您测试句中唯一的in序列。

答案 2 :(得分:1)

怎么样:

[A-Z][\s.](?!.*?[A-Z])(.*)\sin

Expl。:找到一个大写字母,后跟一个空格,后面没有任何后跟大写字母。然后捕获任何内容,但不包括空格,后跟给定的单词。

仅捕获想要的部分。

此致

答案 3 :(得分:0)

怎么样:

^.*(?:\b[A-Z]+\b)(.+?)(?=\sin)

<强>解释

The regular expression:

(?-imsx:^.*(?:\b[A-Z]+\b)(.+?)(?=\sin))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .+?                      any character except \n (1 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    in                       'in'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------