Question

我的数据是：

Hello
Test1
Begin
* nm: 866 444 988
* nm: 08 66
# allowed * nm: 77 2
End
* nm: 0

我希望捕获标记Begin和End之间的每个数字，并且必须以

开头

* num: or # allowed * nm:

我的模式在.Net中工作得很好（我使用捕获集合），但它在其他引擎中不起作用...我的问题是如何添加另一个ancher \ G 来捕获嵌套的重要数字:(问题是关于掌握\ G锚）

(?mxi:
  \G(?!\A)(?:^# allowed[ ])?
   |
  ^Begin\r\n
 )
 \*[ ]nm:[ ]
 (?>(?'digit'\d+)|[ ])+ # the problem is here it return all digits in one group
 \r\n

返回捕获值中的每个数字

由于

编辑：我找到了一个解决方案，但它不是一个élégant模式：

(?mx:
   \G(?!\A)
      |
   ^Begin\r?\n
)
(?:#[ ]allowed[ ])?
\*[ ]nm:
  |
(?!^)\G[ ]*(\d+)\s*

DEMO

编辑：2）

我的第二个模式的另一个问题：如果我在模式的末尾添加[ ]*\r?\n而不是\ s *，则会失败。为什么呢？

 (?xm:
     \G(?!\A)
         |
     ^Begin\r?\n
 )
 (?:#[ ]allowed[ ])?
 \*[ ]nm:
     |
 (?!^)\G[ ]*(\d+)
 [ ]*\r?\n # <-- the problem here

Answer 1

每场比赛的数字都在第1组。它不会成为捕获集合，但这就是\G的原因无论如何都在那里。此外，由于这种性质，它只会使当时的匹配位置无效找到end。

编辑 - 请注意，您可以将(Begin)周围的捕获组作为新块开始的标志。

 # (?mi:(?!\A)\G|(?:(?:^Begin|(?!\A)\G)(?s:(?!^End).)*?(?:^(?:\#[ ]+allowed[ ]+)?\*[ ]+nm:)))[ ]+(\d+)

 (?xmi:
      (?! \A )
      \G 
   |  
      (?:
           (?:
                ^ Begin  
             |  
                (?! \A )
                \G 
           )
           (?s:
                (?! ^ End )
                . 
           )*?
           (?:
                ^ 
                (?: \# [ ]+ allowed [ ]+ )?
                \* [ ]+ nm: 
           )
      )
 )
 [ ]+  
 ( \d+ )                            # (1)

附加评论：

 (?xmi:
      (?! \A )                # Here, matched before, give '[ ]+\d+` a first chance
      \G                      # to match again.
   |  
      (?:                     # Here, could have matched before
           (?:
                ^ Begin                 # Give a new begin position first chance
             |                        # or,
                (?! \A )                # See if this matched before
                \G 
           )

           # If this is new begin or matched before, move the position up to
           # the first/next delimiter 'nm:'

           (?s:                    # Lazy, move the position along (dot-all in this cluster)
                (?! ^ End )
                . 
           )*?
           (?:                     # Here we found the first/next delimiter
                ^ 
                (?: \# [ ]+ allowed [ ]+ )?
                \* [ ]+ nm: 
           )
      )
 )
 [ ]+  
 ( \d+ )                 # (1)

Answer 2

您可以使用此模式：（Java / PCRE / Perl / .NET版本）

(?xm)  # switch on freespacing mode and multiline mode*
(?: \G(?!\A) | ^Begin\r?$ )  # two entry points: the end of the last match OR
                             # "Begin" that starts and ends a line

(?> \n  # a newline can start with:
    (?:
        (?:\Q# allowed \E)? \Q* nm:\E  # 1) the start of a line with numbers,
      |
        (?=End\r?$)                    # 2) the last line end of a block,
      |
        .*                             # 3) or an other full line
    )  
)*  # this group is optional to allow several consecutive numbers,
    # but the branch 3) can be repeated several times until the branch 1)
    # matches and the first number is found, or until the branch 2) matches
    # and closes the block.
\Q \E      # a space
(\d+)  \r? # the number

_{（*）小心多线模式和Ruby：在其他语言中，多线模式从“字符串的开头”改变^和$锚点的含义和“字符串的结尾”到“行的开头”和“行的结尾”。在Ruby中，多线模式允许点匹配换行符（相当于其他语言的“单行”或“dotall”模式）。在Ruby中^和$默认匹配行的起点和终点，无论模式如何。}

这只使用数字不是一行开头的事实。

当正则表达式引擎采用交替的分支2）时，模式将自动失败，因为(?=End$)不能跟\Q \E (\d+)。由于换行符和三个分支都包含在原子组中，因此正则表达式引擎无法回溯并尝试分支3）。通过这种方式，每次分支2）匹配时，连续性都会被打破。

<强>通知：
\Q...\E功能允许在不转义特殊字符的情况下编写文字字符串。在freespacing模式下，\Q...\E中的所有空格都被考虑在内。

要使此模式与ruby一起使用，您需要删除m修饰符，删除所有\Q和\E并在字符类中转义或包含所有空格，特殊字符和锐利用于freespacing编写注释。
例如：(?:\Q# allowed \E)? \Q* nm:\E =＆gt; (?:\#[ ]allowed[ ])? \*[ ]nm:

允许在正则表达式中嵌套连续匹配

2 个答案: