Question

我需要匹配String中最后一个大写单词和另一个单词之间的所有字符。输入文字：在夜晚的大洞和（洞2）墙上跳过的CLEVER狐狸。

使用RegEx：

(?<=\b[A-Z]+\s)(.+?)(?=\sin)

以上正则表达式为 fox JUMPED OVER the big and (Hole 2) wall

预期输出： the big and (Hole 2) wall

任何人都可以破解这个吗？

Answer 1

这可能不是最有效的解决方案，但似乎有效：

String text = "The CLEVER fox JUMPED OVER the big wall in the night.";
String regex = "(\\b[A-Z]+\\s)(?!.*\\b[A-Z]+\\b)(.+?)(\\sin)";
Matcher m = Pattern.compile(regex).matcher(text);
if (m.find()) {
    System.out.println(m.group(2));
}

它使用负向前瞻以确保在捕获所需数据之前文本中不再有大写单词。

Answer 2

您可以在第二个匹配表达式

中简单地排除大写字符

(?<=\b[A-Z]+\s)([^A-Z]+)(?=\sin)

这将强制第一部分与The CLEVER fox JUMPED OVER匹配，第二个匹配表达式将产生the big wall，最后一个匹配您测试句中唯一的in序列。

Answer 3

怎么样：

[A-Z][\s.](?!.*?[A-Z])(.*)\sin

Expl。：找到一个大写字母，后跟一个空格，后面没有任何后跟大写字母。然后捕获任何内容，但不包括空格，后跟给定的单词。

仅捕获想要的部分。

此致

Answer 4

怎么样：

^.*(?:\b[A-Z]+\b)(.+?)(?=\sin)

<强>解释

The regular expression:

(?-imsx:^.*(?:\b[A-Z]+\b)(.+?)(?=\sin))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
    [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \b                       the boundary between a word char (\w)
                             and something that is not a word char
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    .+?                      any character except \n (1 or more times
                             (matching the least amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?=                      look ahead to see if there is:
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    in                       'in'
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

RegEx匹配最后一个大写单词和String中另一个单词之间的所有字符

4 个答案: