如何从字典中删除重复的键值(字符串)对?

时间:2021-05-24 18:34:07

标签: python dictionary nlp similarity

如果根据字符串相似性发现键值对重复,我试图从字典中删除整个键值对。 示例:

d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}

在上面的代码中,1和2是相似的字符串,因此必须删除其中一个,并且以下必须是保持ID不变的输出:

 d1={1:'Colins business partner sends millions of dollars to groups which target lives 
   for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}

请帮我解决这个问题。

1 个答案:

答案 0 :(得分:0)

如果“相似性”是指一个字符串包含在另一个字符串中,而您想消除较短的字符串,则可以通过嵌套循环来完成,如下所示。请注意,您希望复制您的字典,以便在迭代过程中不会更改原始字典。

d1={1:'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}

d2 = dict(d1) #make a copy of d1
for k, sent in d1.items():
    for sentence in d1.values():
        if sent in sentence and len(sent) != len(sentence):
            del d2[k]
            break
print(d2)
# {1: 'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad', 3: 'Don t skip leg day y all'}
相关问题