我必须编写代码以从具有句子分组的excel工作表中搜索正则表达式。我设法找到了代表每个句子的关键词。当我运行下面提到的代码时,它从一个单元格中仅找到一个关键字,然后移至下一个单元格。我试图在表格中显示要求
\bphrase\W+(?:\w+\W+){0,6}?one\b|\bphrase\W+(?:\w+\W+){0,6}?two\b|\bphrase\W+(?:\w+\W+){0,6}?three\b|\bphrase\W+(?:\w+\W+){0,6}?four\b|
答案 0 :(得分:0)
正则表达式:
\b(phrase)\b\W+(?:\w+\W+){0,6}?\b(one|two|three|four)\b
\b(phrase)\b
在单词边界上匹配phrase
。W+
:匹配一个或多个非单词字符(通常为空格)。(?:\w+\W+){0,6}?
匹配一个或多个单词字符,然后一个或多个非单词字符0到6次,并尽可能少地匹配。\b(one|two|three|four)\b
在单词边界上匹配one
,two
,three
或four
。代码:
import re
text = "This sentence has phrase one and phrase word word two and phrase word three and phrase four phrase too many words too many words too many words four again."
l = [m[1] + ' ' + m[2] for m in re.finditer(r'\b(phrase)\b\W+(?:\w+\W+){0,6}?\b(one|two|three|four)\b', text)]
print(l)
打印:
['phrase one', 'phrase two', 'phrase three', 'phrase four']