假设
txt='Daniel Johnson and Ana Hickman are friends. They know each other for a long time. Daniel Johnson is a professor and Ana Hickman is writer.'
是一大段文字,我想删除一大串字符串,例如
removalLists=['Daniel Johnson','Ana Hickman']
从他们那里。我的意思是我想用
替换列表中的所有元素' '
我知道我可以使用诸如
这样的循环轻松完成此操作for string in removalLists:
txt=re.sub(string,' ',txt)
我想知道我是否可以更快地做到这一点。
答案 0 :(得分:3)
一种方法是生成单个正则表达式模式,该模式是替换项的替代。因此,我建议您使用以下正则表达式模式作为您的示例:
\bDaniel Johnson\b|\bAna Hickman\b
要生成此结果,我们首先要用单词边界(\b
)包装每个术语。然后,使用|
作为分隔符将列表折叠为单个字符串。最后,我们可以使用re.sub
将所有出现的任何术语替换为一个空格。
txt = 'Daniel Johnson and Ana Hickman are friends. They know each other for a long time. Daniel Johnson is a professor and Ana Hickman is writer.'
removalLists = ['Daniel Johnson','Ana Hickman']
regex = '|'.join([r'\b' + s + r'\b' for s in removalLists])
output = re.sub(regex, " ", txt)
print(output)
and are friends. They know each other for a long time. is a professor and is writer.