Question

我遇到了这种奇怪的行为，该行为在regex101.com中使用Python设置可以正常工作，但无法在实际的python3.7中捕获：

import re
match_str = r'(?P<header>.*?)(FROG)'
pattern_comment = re.compile( match_str )

# this sort of works
txt = 'this is a FROG'
matches = pattern_comment.match(txt, re.MULTILINE)
print(matches) # <re.Match object; span=(8, 14), match='a FROG'>
print(matches['header']) . # 'a '

# this fails to capture in python, but works in regex101
txt = 'this FROG'
matches = pattern_comment.match(txt, re.MULTILINE)
print(matches)

我不清楚，为什么在第一个示例中捕获的header是a而不是this is a，为什么在第二个示例中捕获失败了。使用search而不是match时，会看到相同的行为。

有什么想法可以像在regex101中一样完全捕获它吗？

Answer 1

您正在使用标志作为开始位置。仅在编译正则表达式时才能添加标志：

import re
match_str = r'(?P<header>.*?)(FROG)'
pattern_comment = re.compile(match_str, re.MULTILINE)

txt = 'this is a FROG'
matches = pattern_comment.match(txt)
print(matches)
print(matches['header'])

txt = 'this FROG'
matches = pattern_comment.match(txt)
print(matches)

Pattern.match和Pattern.search的第二个参数是pos。您正在传递re.MULTILINE，它是8。这意味着匹配是从第8个字符开始的。

奇怪的重新匹配行为Python与regex101 / python

1 个答案: