Question

我有一个词典列表，其中每个词典都是有关文章的信息。有时，相同的文章“标题”会在各个字典中重复出现。我想删除这些重复的词典，以使title词典列表中的每篇文章都是唯一的，即在所有词典中都不会重复标题。

我有

data = [{'title':'abc','source':'x','url':'abcx.com'},
            {'title':'abc','source':'y','url':'abcy.com'},
            {'title':'def','source':'g','url':'defg.com'}]

预期结果：

data = [{'title':'abc','source':'x','url':'abcx.com'},
            {'title':'def','source':'g','url':'defg.com'}]

Answer 1

一种快速的方法是跟踪您看到的标题：

titles_seen = set() #thank you @Mark Meyer
data = [{'title':'abc','source':'x','url':'abcx.com'},
        {'title':'abc','source':'y','url':'abcy.com'},
        {'title':'def','source':'g','url':'defg.com'}]
new_data = []
for item in data:
    if item['title'] not in titles_seen:
        new_data.append(item)
    titles_seen.add(item['title'])

正如@Mark Meyer在评论中指出的那样，您可以使用title作为字典中的键，这样可以消除由于标题散列而造成的重复，或者可以定义{{1} }类，然后只需使用Entry（潜在的过度杀伤力）：

frozenset

>>> data
[<Entry title=abc source=x url=abcx.com />, <Entry title=abc source=y url=abcy.com />, <Entry title=def source=g url=defg.com />]
>>> frozenset(data)
frozenset({<Entry title=def source=g url=defg.com />, <Entry title=abc source=x url=abcx.com />})

但是更好的方法是简单地先检查标题是否存在，然后再添加到列表中。

Answer 2

两行已设置：

tmp = set()
result = [tmp.add(i['title']) or i for i in data if i['title'] not in tmp]

根据字典列表中的特定字典键检测和删除重复项

2 个答案: