Question

Python3.3，OS X 7.5

我试图找到如下定义的4个字符的子串的所有实例：

第一个字符='N'
第二个字符=除“P”之外的任何内容
第三个字符='S'或'T'
第四个字符=除了'P'之外的任何东西

我的查询如下：

re.findall(r"\N[A-OQ-Z][ST][A-OQ-Z]", text)

除了两个子串重叠的特定情况外，这是有效的。这种情况涉及以下5个字符子串：

'...NNTSY...'

查询捕获第一个4字符子字符串（'NNTS'），但不捕获第二个4字符子字符串（'NTSY'）。

这是我对正则表达式的第一次尝试，显然我错过了一些东西。

Answer 1

如果重新引擎不匹配字符，则可以执行此操作，这可以通过前瞻断言实现：

import re
text = '...NNTSY...'
for m in re.findall(r'(?=(N[A-OQ-Z][ST][A-OQ-Z]))', text):
    print(m)

输出：

NNTS
NTSY

断言中的所有内容都有效，但也感觉很奇怪。另一种方法是将N从断言中取出：

for m in re.findall(r'(N(?=([A-OQ-Z][ST][A-OQ-Z])))', text):
    print(''.join(m))

Answer 2

从Python 3文档（重点添加）：

$ python3 -c 'import re; help(re.findall)'
Help on function findall in module re:

findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

如果您想要重叠实例，请在循环中使用regex.search()。您必须编译正则表达式，因为非编译正则表达式的API不会使用参数来指定起始位置。

def findall_overlapping(pattern, string, flags=0):
    """Find all matches, even ones that overlap."""
    regex = re.compile(pattern, flags)
    pos = 0
    while True:
        match = regex.search(string, pos)
        if not match:
            break
        yield match
        pos = match.start() + 1

Answer 3

(N[^P](?:S|T)[^P])

Regular expression visualization

Edit live on Debuggex

Python RegEx查询缺少重叠的子串

3 个答案: