Question

我正在学习python中的re模块。我发现了一些对我而言毫无意义的东西，我也不知道为什么。这是一个小例子，

x=re.compile(r'(ha)*')
c=x.search('the man know how to hahahaha')
print(c.group())#output will be nothing,no error.But i expect "hahahaha"

如果我使用re.compile(r'(ha)?')，也会发生

x=re.compile(r'(ha)?')
c=x.search('the man know how to hahahaha')
print(c.group())#output will be nothing,no error.But i expect "ha".

但是如果我使用re.compile(r'(ha)+')，

x=re.compile(r'(ha)+')
c=x.search('the man know how to hahahaha')
print(c.group())#output will be `hahahaha`,just as expected.

为什么这样，在这种情况下re.compile(r'(ha)*')和re.compile(r'(ha)+')不一样？

Answer 1

模式r'h+'和r'h*'不相同，这就是为什么它们不能提供相同结果的原因。 +表示您的模式有1个或更多匹配项，*则是零个或多个匹配项：

re.search返回“无”，因为它仅查看第一匹配。 *的 first 匹配是您的'(ha)'模式在字符串的第一个字母处零出现的情况：

import re
x=re.compile(r'(ha)*')
c=x.findall('the man know how to hahahaha')   # get _all_ matches
print(c)

输出：

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'ha', '']

# t   h  e        m   a   n       k   n   o   w      h    o   w       t   o      hahahaha

*和?量词允许0个匹配项

Doku：

Pattern.search（string [，pos [，endpos]]）
  扫描字符串以查找正则表达式匹配的 first 位置，...
  （来源：https://docs.python.org/3/library/re.html#re.Pattern.search）

在python正则表达式中，为什么（h）*和（h）+无法产生相同的结果？

1 个答案: