Question

我试图找到包含＆＃34; hell＆＃34;一句话。下面的字符串中有3次出现。但是re.search只返回前两次出现。我试过findall和搜索。有人可以告诉我这里有什么问题吗？

>>> s = 'heller pond hell hellyi'
>>> m = re.findall('(hell)\S*', s)
>>> m.group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> m = re.search('(hell)\S*', s)
>>> m.group(0)
'heller'
>>> m.group(1)
'hell'
>>> m.group(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group
>>>

Answer 1

您可以使用re.findall并搜索hell，其中包含零个或多个字词：

>>> import re
>>> s = 'heller pond hell hellyi'
>>> re.findall('\w*hell\w*', s)
['heller', 'hell', 'hellyi']
>>>

Answer 2

您可以使用str.split查看子字符串是否在每个单词中：

s = 'heller pond hell hellyi'

print([w for w in s.split() if "hell" in w])

Answer 3

您的正则表达式未找到hell，因为您只匹配其他非空格字符之前的hell。相反，只需寻找文字hell - 没什么特别的。

In [3]: re.findall('hell', 'heller pond hell hellyi')
Out[3]: ['hell', 'hell', 'hell']

修改

根据你的评论，如果在单词的中间找到它，你想要返回整个单词。在这种情况下，您应该使用*零或多或更多量词。

In [4]: re.findall(r"\S*hell\S*", 'heller pond hell hellyi') Out[4]: ['heller', 'hell', 'hellyi']

换句话说：

re.compile(r""" \S* # zero or more non-space characters hell # followed by a literal hell \S* # followed by zero or more non-space characters""", re.X)

请注意，Padraic的答案肯定是最好的解决方法：

[word for word in "heller pond hell hellyi".split() if 'hell' in word]

Answer 4

也许是我，但我使用正则表达式很少。 Python3有广泛的文本函数，使用内置函数有什么问题？

'heller pond hell hellyi'.count('hell')

我看到的唯一缺点就是这种方式我从未真正学会使用正则表达式。： - ）

在Python3中查找字符串中所有出现的单词

4 个答案: