使用正则表达式从单元格中搜索多个关键字

时间:2019-11-25 05:21:24

标签: python regex text-mining data-analysis orange

我必须编写代码以从具有句子分组的excel工作表中搜索正则表达式。我设法找到了代表每个句子的关键词。当我运行下面提到的代码时,它从一个单元格中仅找到一个关键字,然后移至下一个单元格。我试图在表格中显示要求

enter image description here

\bphrase\W+(?:\w+\W+){0,6}?one\b|\bphrase\W+(?:\w+\W+){0,6}?two\b|\bphrase\W+(?:\w+\W+){0,6}?three\b|\bphrase\W+(?:\w+\W+){0,6}?four\b|

1 个答案:

答案 0 :(得分:0)

正则表达式:

\b(phrase)\b\W+(?:\w+\W+){0,6}?\b(one|two|three|four)\b
  1. \b(phrase)\b在单词边界上匹配phrase
  2. W+:匹配一个或多个非单词字符(通常为空格)。
  3. (?:\w+\W+){0,6}?匹配一个或多个单词字符,然后一个或多个非单词字符0到6次,并尽可能少地匹配。
  4. \b(one|two|three|four)\b在单词边界上匹配onetwothreefour

代码:

import re

text = "This sentence has phrase one and phrase word word two and phrase word three and phrase four phrase too many words too many words too many words four again."

l = [m[1] + ' ' + m[2] for m in re.finditer(r'\b(phrase)\b\W+(?:\w+\W+){0,6}?\b(one|two|three|four)\b', text)]
print(l)

打印:

['phrase one', 'phrase two', 'phrase three', 'phrase four']