Question

我有一个词典列表：

l = [{u'content': [], u'name': u'child_01'}, {u'content': [], u'name': u'child_01'} ,{u'content': [], u'name': u'child_04'}]

我希望检测重复项并用

替换它们

{u'content': [], u'name': u'child'}

set没有用于字典。

Answer 1

嗯，这不是完整的答案，但也许它会给你一些提示，你可以做到以下几点：

l = [{u'content': [], u'name': u'child_01'}, {u'content': [], u'name': u'child_01'} ,{u'content': [], u'name': u'child_04'}]

from itertools import groupby
uniques = [key for key, value in groupby(l)]

您遇到的问题是字典不可用，因此您无法调用set()。现在，我知道这个解决方案没有插入：

{u'content': [], u'name': u'child'}

对于重复项，但是当您为每个副本插入它们时，您将创建另一个重复项，因此可能会这样做。首先，将列表缩小为仅包含唯一值，然后比较列表的长度（缩减前后），并添加尽可能多的＆＃34;默认＆＃34;你想要的字典，像这样：

defaults_cnt = len(l) - len(uniques)
default = {u'content': [], u'name': u'child'}
uniques.extend(default for _ in xrange(defaults_cnt))

现在：

from pprint import pprint
pprint(uniques)
[{'content': [], 'name': 'child_01'},
 {'content': [], 'name': 'child_04'},
 {'content': [], 'name': 'child'}]

Answer 2

您可以定义一个类Entry，它是dict的子类，并使其成为hashable。

class Entry(dict):
    def __init__(self, *args, **kwargs):
        super(Entry, self).__init__(*args, **kwargs)
        self['name'] = self['name'].split('_')[0]

    def __hash__(self):
        return hash(self['name'])

    def __eq__(self, other):
        if isinstance(other, Entry):
            return self['name'] == other['name']
        return NotImplemented

    def __ne__(self, other):
        return not self == other

name属性在构造函数中规范化，以便删除数字后缀。

要将此类与set一起使用，您还需要定义__eq__和__ne__运算符。

您可以像这样使用此类删除重复项：

l = [{u'content': [], u'name': u'child_01'},
     {u'content': [], u'name': u'child_01'},
     {u'content': [], u'name': u'child_04'}]

entries = set(Entry(attrs) for attrs in l)

print(entries)

你得到一套dict：

Python 2.7：替换字典列表中的重复项

2 个答案: