Question

我正在尝试使用正则表达式并遇到以下问题。

说，我的行以batman开头和结尾，中间有一些任意数字，我想要捕获组中的数字以及单词batman。

batman 12345 batman
batman 234 batman
batman 35655 batman
batman 1311 batman

这很容易实现（简单的一个=＆gt; (\s*batman (\d+) batman\s*) DEMO）。

现在我尝试了一点......将相同的数据放在capture tag (#capture)

中

#capture
batman 12345 batman
batman 234 batman
batman 35655 batman
batman 1311 batman
#capture

#others
batman 12345 batman
batman 234 batman
batman 35655 batman
batman 1311 batman
#others

我试图仅在#capture和我尝试

之间捕捉线条

(?:#capture)(\s*batman (\d+) batman\s*)*(?:#capture)

匹配模式但仅包括捕获组中的最后一次迭代，即$1=>batman $2=>1311 $1=>batman DEMO

我还尝试使用

捕获重复组

(?:#capture)((\s*batman (\d+) batman\s*)*)(?:#capture)

这个捕获了所有内容..但是在不同的组中.. DEMO

有人可以帮我理解和解决这个问题吗？

预期结果：仅捕获#capture中的群组和群组中的所有数字，以便轻松替换。

感谢。

Answer 1

您可以在.NET正则表达式风格中利用非固定宽度的后视，并使用此正则表达式：

(?s)(?<=#capture.*?)(?:batman (\d+) batman)(?=.*?#capture)

enter image description here

但是，此示例适用于您提供的案例（例如，如果文本中还有更多#capture...#capture块，它将无法工作），您只需添加更多基于标签上下文。

在PCRE / Perl中，您可以通过声明我们想要跳过的内容来获得类似的结果：

(?(DEFINE)                          # Definitions
    (?<skip>\#others.*?\#others)    # What we should skip
)
(?&skip)(*SKIP)(*FAIL)              # Skip it
|
(?<needle>batman\s+(\d+)\s+batman)  # Match it

然而，请替换为batman new-$3 batman。

请参阅此demo on regex101。

Answer 2

由于PCRE无法像.net框架或Python的新正则表达式模块那样存储重复捕获，因此有可能使用\G功能并进行检查以确保块的结尾是达到。

\G锚点标记上一场比赛结束时的位置，并用于全球研究环境（preg_match_all或preg_replace*）。找到连续的结果很有用。请注意，直到第一个匹配\G默认标记字符串的开头。因此，为防止\G在字符串的开头成功，您需要添加否定前瞻(?!\A)。

$pattern = '~
(?:        # two possible branches
    \G(?!\A)       # the contiguous branch
  |
    [#]capture \R  # the start branch: only used for the first match
)
(batman \h+ ([0-9]+) \h+ batman)
\R    # alias for any kind of newlines 
(?: ([#]) (?=capture) )?  # the capture group 3 is used as a flag
                          # to know if the end has been reached.
                          # Note that # is not in the lookahead to
                          # avoid the start branch to succeed
~x';

if (preg_match_all($pattern, $text, $matches) && array_pop($matches[3])) {
    print_r($matches[1]);
}

重复捕获组与使用嵌套模式捕获重复组

2 个答案: