Question

就像问题标题一样。

我是Python和正则表达式的新手。因此，我必须从段落中搜索特定的单词并显示所有出现的索引。

例如：

该段落是：

这是一个测试文本，用于测试和测试。

和单词：

测试

算法应返回上段中3个单词 test 的非重叠出现的索引（但不是 testing ，因为我意思是搜索整个单词，而不仅仅是子字符串）。

另一个带有相同段落和“单词”的例子：

测试和

该算法应返回2次测试和。

我想我必须使用一些正则表达式来查找整个单词的模式，其中前面和后面是标点符号，例如. , ; ? -

谷歌搜索后我发现应该使用像re.finditer这样的东西，但似乎我还没找到正确的方法。请帮忙，提前谢谢。 ;）

Answer 1

是的，finditer是要走的路。使用start()查找匹配的索引。

示例：

import re a="This is a testing text and used to test and test and test." print [m.start() for m in re.finditer(r"\btest\b", a)] print [m.start() for m in re.finditer(r"\btest and\b", a)]

<强>输出：

[35,44,53]
[35,44]

Answer 2

在正则表达式中使用单词边界锚\b表示您希望匹配在单词边界处开始/结束。

>>> sentence = "This is a testing text and used to test and test and test."
>>> pattern = re.compile(r'\btest\b')
>>> [m.start() for m in pattern.finditer(sentence)]
[35, 44, 53]

如何使用正则表达式从文本中查找特定单词并返回所有出现的单词？

2 个答案: