Question

我无法弄清楚如何匹配字符串，但如果它有一个尾随的换行符（\n），它似乎自动被剥离：

import re

print(re.match(r'^foobar$', 'foobar'))
# <_sre.SRE_Match object; span=(0, 6), match='foobar'>

print(re.match(r'^foobar$', 'foobar\n'))
# <_sre.SRE_Match object; span=(0, 6), match='foobar'>

print(re.match(r'^foobar$', 'foobar\n\n'))
# None

对我来说，第二种情况也应该返回None 当我们使用$设置模式的结尾时，例如^foobar$，它应该只匹配foobar之类的字符串，而不是foobar\n。

我错过了什么？

Answer 1

这是$的定义行为，可以在@zvone链接到https://regex101.com甚至{{3}}的文档中阅读：

$断言字符串末尾的位置，或者在字符串末尾的行终止符之前（如果有的话）

您可以使用明确的否定前瞻来抵制此行为：

import re

print(re.match(r'^foobar(?!\n)$', 'foobar'))
# <_sre.SRE_Match object; span=(0, 6), match='foobar'>

print(re.match(r'^foobar(?!\n)$', 'foobar\n'))
# None

print(re.match(r'^foobar(?!\n)$', 'foobar\n\n'))
# None

Answer 2

documentation说明$字符：

匹配字符串的结尾或者在结尾处的换行符之前字符串，并且在MULTILINE模式下也匹配换行符。

因此，如果没有MULTILINE选项，它会与您尝试的前两个字符串完全匹配：'foobar'和'foobar\n'，而不是'foobar\n\n'，因为这不是换行符在字符串的末尾。

另一方面，如果您选择MULTILINE选项，它将匹配任何行的结尾：

>>> re.match(r'^foobar$', 'foobar\n\n', re.MULTILINE)
<_sre.SRE_Match object; span=(0, 6), match='foobar'>

当然，这也符合以下情况，可能是您想要的也可能不是：

>>> re.match(r'^foobar$', 'foobar\nanother line\n', re.MULTILINE)
<_sre.SRE_Match object; span=(0, 6), match='foobar'>

为了与结束换行符use the negative lookahead as DeepSpace wrote不匹配。

Answer 3

您更可能不需要$，而是\Z：

>>> print(re.match(r'^foobar\Z', 'foobar\n'))
None

\Z仅匹配字符串的末尾。

正则表达式：不要将带有换行符的字符串结尾（\ n）与行尾锚点（$）匹配

3 个答案: