如何基于Python中的部分匹配从文本中删除子字符串?

时间:2019-03-14 03:49:02

标签: python text data-cleaning

我有一长串文本,其中包含要基于部分匹配(90%)删除的子文本。

string = "Adam is a boy who lives in Michigan.  
        He loves to eat apples and oranges. 
        He also enjoys playing with his dog and cat. 
        Adam is a happy boy."

substring = "He loves to apple oranges"

我想回来

"Adam is a boy who lives in Michigan.  
 He also enjoys playing with his dog and cat. 
 Adam is a happy boy."

在子字符串中没有出现“吃”和“和”这两个字,但我想删除整个句子“他喜欢吃苹果和橘子”。我不太确定该怎么做。谢谢!

2 个答案:

答案 0 :(得分:4)

您可以使用difflib.SequenceMatcher

from difflib import SequenceMatcher
'\n'.join(s for s in string.splitlines() if SequenceMatcher(' '.__eq__, s, substring).ratio() < 0.6)

这将返回:

Adam is a boy who lives in Michigan.
He also enjoys playing with his dog and cat.
Adam is a happy boy.

演示:https://ideone.com/twDu1r

答案 1 :(得分:0)

string = string.replace(substring,'')

这会将字符串中的子字符串替换为空(""