如果根据字符串相似性发现键值对重复,我试图从字典中删除整个键值对。 示例:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}
在上面的代码中,1和2是相似的字符串,因此必须删除其中一个,并且以下必须是保持ID不变的输出:
d1={1:'Colins business partner sends millions of dollars to groups which target lives
for gruesome deaths domestically and abroad',
3:'Don t skip leg day y all'}
请帮我解决这个问题。
答案 0 :(得分:0)
如果“相似性”是指一个字符串包含在另一个字符串中,而您想消除较短的字符串,则可以通过嵌套循环来完成,如下所示。请注意,您希望复制您的字典,以便在迭代过程中不会更改原始字典。
d1={1:'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad',
2:'Colins business partner sends millions of dollars to groups which target lives',
3:'Don t skip leg day y all'}
d2 = dict(d1) #make a copy of d1
for k, sent in d1.items():
for sentence in d1.values():
if sent in sentence and len(sent) != len(sentence):
del d2[k]
break
print(d2)
# {1: 'Colins business partner sends millions of dollars to groups which target lives for gruesome deaths domestically and abroad', 3: 'Don t skip leg day y all'}