从列表替换子字符串的有效方法

时间:2018-08-10 11:17:32

标签: python list replace similarity

嗨,我有一个大的文档另存为一个句子,并在文档中列出了一些专有名称。

我想用标签[PERSON]替换列表的实例

ex: sentence = "John and Marie went to school today....."

list = ["Maria", "John"....]
  

结果= [PERSON]和[PERSON]今天上学了

正如您所看到的,我可能仍想使用玛丽亚和玛丽这样的名字,因为它们的拼写不同但很接近。

我知道我可以使用循环,但是由于列表和句子很大,因此可能会有更有效的方法。谢谢

2 个答案:

答案 0 :(得分:1)

使用fuzzywuzzy检查句子中的每个单词是否与名称紧密匹配(匹配百分比高于80%),如果是,则将其替换为[PERSON]

>>> from fuzzywuzzy import process, fuzz
>>> names = ["Maria", "John"]
>>> sentence = "John and Marie went to school today....."
>>>
>>> match = lambda word: process.extractOne(word, names, scorer=fuzz.ratio, score_cutoff=80)
>>> ' '.join('[PERSON]' if match(word) else word  for word in sentence.split())
'[PERSON] and [PERSON] went to school today.....'

答案 1 :(得分:0)

您可以在输入列表中使用regex来匹配具有拼写变化的单词。例如,如果您需要匹配 Marie Maria ,则可以使用 Mari(e | a)作为正则表达式。这是可以使用的后续代码:

import re

mySentence = "John and Marie and Maria went to school today....."
myList = ["Mari(e|a)", "John"]

myNewSentence = re.compile("|".join(myList)).sub('[PERSON]', mySentence)

print(myNewSentence)  # [PERSON] and [PERSON] and [PERSON] went to school today.....