如何删除列表中的重复dict,忽略dict键?

时间:2015-05-17 14:19:35

标签: python python-2.7

我有一个词典列表。每个字典都有几个键值,以及一个任意(但很重要)的键值对。例如

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

我想删除重复的字典,以便只忽略非“ignore-key”值。我已经看到related question了 - 但它只考虑完全相同的词组。有没有办法删除几乎重复,以便上面的数据变为

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

忽略哪个副本无关紧要。我怎么能这样做?

7 个答案:

答案 0 :(得分:5)

key保留一组看到的值并删除任何具有相同值的字典:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

如果值始终是分组的,您可以使用value中的key进行分组,并从每个组中获取第一个字典:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

或使用类似于DSM答案的生成器修改原始列表而不复制:

def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

如果你不关心删除哪个欺骗,只需使用反转:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

要忽略所有使用groupby的ignore_key:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

答案 1 :(得分:2)

你可以把事情塞进一两行,但我觉得写一个函数更简洁:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

给出了

>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

这假设您的值可以清除。 (如果他们不是,相同的代码将与seen = []seen.append(index)一起使用,尽管它对长列表的效果不佳。)

答案 2 :(得分:1)

从原始列表开始:

thelist = [
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

创建一个集合,并在过滤列表时填充它。

uniques, theNewList = set(), []
for d in theList:]
    cur = d["key"] # Avoid multiple lookups of the same thing
    if cur not in uniques:
        theNewList.append(d)
    uniques.add(cur)

最后,重命名列表:

theList = theNewList

答案 3 :(得分:0)

您可以使用dicts字典而不是使用dicts列表。每个dict的关键值都是主要词典的关键。

像这样:

thedict = {}

thedict["value1"] = {"ignore_key" : "arb1", ...}  
thedict["value2"] = {"ignore_key" : "arb11", ...}

由于dict不允许重复键,因此您的问题不存在。

答案 4 :(得分:0)

不改变thelist

result = []
seen = set()
thelist = [
    {"key" : "value1", "ignore_key" : "arb1"},
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

for item in thelist:
    if item['key'] not in seen:
        result.append(item)
        seen.add(item['key'])

print(result)

答案 5 :(得分:0)

创建一组唯一值并检查(& update):

values = {d['key'] for d in thelist}
newlist = []

for d in thelist:
    if d['key'] in values:
        newlist.append(d)
        values -= {d['key']}

thelist = newlist

答案 6 :(得分:0)

您可以使用字典而不是集来删除重复项,从而使accepted answer适应链接的问题。

以下首先构建一个临时字典,其键是android:scaleType="fitXY" 中每个字典中项目的元组,除了忽略的一个,它被保存为与每个键相关联的值。这样做可以消除重复,因为它们将成为相同的密钥,但保留被忽略的密钥及其忽略的值(最后一个或只有一个)。

第二步,通过创建由每个键的组合加上临时词典中的项的相关值组成的词典来重新创建thelist

如果你愿意的话,你可以把这两个步骤组合成一个完全不可读的单行...

thelist

输出:

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
             (IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]

for item in thelist:
    print item