如何:重叠匹配

时间:2016-11-21 15:40:06

标签: python regex

让我们说有这个:

A2 A1 B.         #1

A1 B.            #2

A3 A1 A8 B.      #3

如果我愿意,我该怎么办?

  1. 匹配:A2 A1 B.A1 B.
  2. 匹配:A1 B.
  3. 匹配:A3 A1 A8 B.A1 A8 B.以及A8 B.
  4. 到目前为止,我有这个正则表达式:

    A\d\s(.*\.)
    

    但是它不匹配已经匹配的代码子集(我使用re.finditer进行匹配)/我的猜测是re.finditer正在做它应该做的,我是只是试图强迫它做蠢事。

    Playground

1 个答案:

答案 0 :(得分:2)

您可以使用前瞻为目标并在前瞻中捕获值:

regex = r"(?=((?:A\d+\s+)+B\.))"

RegEx Demo

RegEx说明:

(?=               # start lookahead
   (              # start capturing group #1
      (?:         # start non-capturing group
         A\d+\s+  # match A followed by 1 or more digit followed by 1 or more whitespace
      )           # end non-capturing group
      +B\.        # match B and literal DOT
   )              # end capture group #1
)                 # end lookahead

<强>代码:

>>> regex = r"(?=((?:A\d+\s+)+B\.))"

>>> print re.findall(regex, 'A2 A1 B.')
['A2 A1 B.', 'A1 B.']

>>> print re.findall(regex, 'A1 B.')
['A1 B.']

>>> print re.findall(regex, 'A3 A1 A8 B.')
['A3 A1 A8 B.', 'A1 A8 B.', 'A8 B.']