Question

import re

regex = r"[^.?!-]*(?<=[.?\s!-])\b(pfs)\b(?=[\s.?!-])[^.?!-]*[.?!-]"

test_str = "pfs alert conf . it is unlikely that we will sign it - pfs of $ 950 filed to driver - we are gathering information"

subst = ""

result = re.sub(regex, subst, test_str, 0, re.IGNORECASE | re.MULTILINE)

if result:
    print (result)

我们看到，test_str有两个包含关键字“ pfs”的句子。但是，上面的python代码只能提取第二句话'提交给驱动程序的$ 950 pfs，如何修改它以提取'pfs alert conf'？

Answer 1

考虑改用nltk，imo真的更适合这里：

from nltk import sent_tokenize

test_str = "pfs alert conf . it is unlikely that we will sign it - pfs of $ 950 filed to driver - we are gathering information. some junky words thereafter"
sentences = [sent for sent in sent_tokenize(test_str) if "pfs" in sent]
print(sentences)

这产生了（注意缺少pfs的最后一个句子）：

['pfs alert conf .', 
 'it is unlikely that we will sign it - pfs of $ 950 filed to driver - we are gathering information.']

Answer 2

第一个pfs在行的开头，但是在正向查找后，您将1个字符与(?<=[.?\s!-])匹配。您可以使用替换来断言行{{1}的开始}或^

[^.?!-]*(?<=[.?\s!-])

Regex demo

Demo python

Python Regex-提取包含相同关键字的多个句子

2 个答案: