Question

我想重复一下自然数字的发生并抓住它们。

null

它匹配'六'2次，而可以观察到它可能从未匹配“六六”;相反它必须匹配“六十六”，但捕获返回（'六'，'六'）。

这里发生了什么以及如何返回（'六十'，'六'）？

Answer 1

re.search只找到匹配模式的第一件事，一旦找到它就不会寻找更多匹配。你得到('six ', 'six')，因为你有一个捕获组嵌套在另一个中; 'six '与外部组匹配，'six'（没有尾随空格）与内部组匹配。

您可以在一些使用(?:...)语法的非捕获组中使用两个非嵌套捕获组来执行您想要的操作。

import re

r = "the (?:(?:(sixty)|(six))[ -]+)+items"
s = "the sixty six items"
m = re.search(r, s)
if m:
    print(m.groups())

<强>输出

('sixty', 'six')

这会返回两个元组的元组，因为模式中有两个捕获组。

这是一个更长的演示。

import re

pat = re.compile("the (?:(?:(sixty)|(six))[ -]+)+items")

data = (
    "the items",
    "the six items",
    "the six six items",
    "the sixty items",
    "the six sixty items",
    "the sixty six items",
    "the sixty-six items",
    "the six sixty sixty items",
)

for s in data:
    m = pat.search(s)
    print('{!r} -> {}'.format(s, m.groups() if m else None))

<强>输出

'the items' -> None
'the six items' -> (None, 'six')
'the six six items' -> (None, 'six')
'the sixty items' -> ('sixty', None)
'the six sixty items' -> ('sixty', 'six')
'the sixty six items' -> ('sixty', 'six')
'the sixty-six items' -> ('sixty', 'six')
'the six sixty sixty items' -> ('sixty', 'six')

Answer 2

如果您使用(group)+，则只会在该组中捕获最后匹配的文字。

您应该使用findall使用略有不同的正则表达式。

s = 'the sixty six items'

>>> if re.match(r'the (?:(?:sixty|six)[ -]+)+items', s):
...     re.findall(r"\b(sixty|six)[ -]+(?=.*\bitems\b)", s)
...
['sixty', 'six']

您的问题有以下代码：

>>> r = "the ((sixty|six)[ -]+)+items"
>>> s = "the sixty six items"
>>> re.findall(r, s)

由于您的群组之后使用了量词，即[('six ', 'six')]

，因此返回((sixty|six)[ -]+)+

findall返回2个

值

captured group #1为"six "（请注意第一组[ -]+中的空格）
captured group #2是"six"（内部群组，即(sixty|six)）

Answer 3

使用\b断言：希望这会有所帮助。

>>> s = "the sixty six items"
>>> print(re.findall(r'(?is)(\bsixty\b|\bsix\b)',s))
['sixty', 'six']

\b断言会避免误命，例如：如果你加16，又不想匹配

没有\b

>>> s = "the sixty sixteen six items"
>>> print(re.findall(r'(?is)(sixty|six)',s))
['sixty', 'six', 'six']

\b（优势）

>>> s = "the sixty sixteen six items"
>>> print(re.findall(r'(?is)(\bsixty\b|\bsix\b)',s))
['sixty', 'six']

Answer 4

尝试正则表达式

re.findall('(six\w*)', s)

在Python中重复捕获奇怪的结果

4 个答案: