Python使用值合并2个或更多Dicts来处理重复键

时间:2014-12-07 22:07:27

标签: python dictionary merge key

我正在合并有一些重复键的词典。值将不同,我想忽略较低值记录。

dict1 = {1 :["in",1], 2 :["out",1], 3 :["in",1]}
dict2 = {1 :["out",2], 2 :["out",1]}

如果键是相等的,我希望具有最大值的key[0][1]在新的dict中。 合并这两个词的输出应该是:

dict3 = {1 :["out",2], 2 :["out",1], 3 :["in",1]}

我知道解决这个问题的唯一方法是运行一个带有条件的循环来确定要添加到合并的dict中的哪一个。有更多的pythonic方式吗?

重复的密钥非常少,只有不到1%,如果这会对最终解决方案产生任何影响。

5 个答案:

答案 0 :(得分:2)

单个词典理解可以做到这一点

from operator import itemgetter
{k: max(dict1.get(k, (None, float('-Inf'))), dict2.get(k, (None,float('-Inf'))),
key=itemgetter(1)) for k in dict1.viewkeys() | dict2.viewkeys()}

答案 1 :(得分:2)

pythonic解决方案应该在很大程度上依赖于python标准库和可用的语法结构。不仅要简化代码,还要提高性能。

在您的情况下,您可以从以下事实中受益:只有1%的密钥出现在两个词典中:

 conflictKeys = set(dict1) & set(dict2)      # get all keys, that are in both dictionaries
 solvedConflicts = { key: dict1[key] 
                          if dict1[key][1] > dict2[key][1] 
                          else dict2[key] 
                     for key in conflictKeys }  # dictionary with conflict keys only and their wanted value

 result = dict1.copy()                       # add values unique to dict1 to result
 result.update(dict2)                        # add values unique to dict2 to result
 result.update(solvedConflicts)              # add values occuring in both dicts to result

此解决方案将尽量避免运行" slow" python解释器为两个字典的每个键,但将使用快速python库例程(用C编写)。那就是:

  • dict.update()合并两个词典
  • set.intersection()(set1& set2的同义词)以解决所有冲突

只有解决冲突的密钥才需要python解释器遍历所有条目。但即使在这里,你也可以在性能方面获益于pythonic构造"list comprehenion"(与循环的命令相比)。这是因为solveConflicts的内存可以立即分配而无需重新分配。循环的必要性需要逐个增加生成的solveConflicts元素,这需要大量的内存重新分配。

答案 2 :(得分:1)

dict1 = {1 :["in",1], 2 :["out",1], 3 :["in",1]}

dict2 = {1 :["out",2], 2 :["out",1]}
vals = []
# get items from dict1 and common keys with largest values
for k, v in dict1.iteritems():
    if k in dict2:
        if dict2[k][1] > v[1]:
            vals.append((k, dict2[k]))
        else:
            vals.append((k,v))
    else:
        vals.append((k,v))
new_d = {}
# add all dict2 to a new dict
new_d.update(dict2) 

# add dict1 items and overwrite common keys with larger value
for k,v in vals:
    new_d[k] = v
print(new_d)
{1: ['out', 2], 2: ['out', 1], 3: ['in', 1]}

您也可以复制和删除:

cp_d1 = dict1.copy()
cp_d2 = dict2.copy()

for k, v in dict1.iteritems():
    if k in dict2:
        if dict2[k][1] > v[1]:
            del cp_d1[k]
        else:
            del cp_d2[k]
cp_d1.update(cp_d2)

print(cp_d1)
{1: ['out', 2], 2: ['out', 1], 3: ['in', 1]}

某些时间显示复制效率最高,使用groupby效率最低:

In [9]: %%timeit
   ...: vals = []
   ...: cp_d1 = dict1.copy()
   ...: cp_d2 = dict2.copy()
   ...: for k, v in dict1.iteritems():
   ...:     if k in dict2:
   ...:         if dict2[k][1] > v[1]:
   ...:             del cp_d1[k]
   ...:         else:
   ...:             del cp_d2[k]
   ...: cp_d1.update(cp_d2)
   ...: 

1000000 loops, best of 3: 1.61 µs per loop
In [20]: %%timeit


 ....: vals = []
   ....: for k, v in dict1.iteritems():
   ....:     if k in dict2:
   ....:         if dict2[k][1] > v[1]:
   ....:             vals.append((k, dict2[k]))
   ....:         else:
   ....:             vals.append((k,v))
   ....:     else:
   ....:         vals.append((k,v))
   ....: new_d = {}
   ....: new_d.update(dict2)
   ....: for k,v in vals:
   ....:     new_d[k] = v
   ....: 
100000 loops, best of 3: 2.11 µs per loop


In [10]: %%timeit                 
 {k: max(dict1.get(k), dict2.get(k), key=lambda x: x[1] if x else None)
  for k in dict1.viewkeys() | dict2.viewkeys()}
   ....: 
100000 loops, best of 3: 3.71 µs per loop

In [22]: %%timeit
   ....: l=dict2.items() +dict1.items() # if you are in python 3 use : list(dict1.items()) + list(dict2.items())
   ....: g=[list(g) for k,g in groupby(sorted(l),lambda x : x[0])]
   ....: dict([max(t,key=lambda x: x[1][1]) for t in g])
   ....: 
100000 loops, best of 3: 10.1 µs per loop


In [61]: %%timeit
   ....: conflictKeys = set(dict1) & set(dict2)  
   ....: solvedConflicts = { key: dict1[key] 
   ....:                       if dict1[key][1] > dict2[key][1] 
   ....:                       else dict2[key] 
   ....:                  for key in conflictKeys } 
   ....: result = dict1.copy()                     
   ....: result.update(dict2)                       
   ....: result.update(solvedConflicts)  
   ....: 

100000 loops, best of 3: 2.34 µs per loop

答案 3 :(得分:1)

如果所提到的元素的交集较低,那么使用集合也会有所帮助

def out_dict(dict1, dict2):
    dict3 = {}
    s1 = set(dict1)
    s2 = set(dict2)
    for i in s1-s2:
        dict3[i] = dict1[i]
    for i in s2-s1:
        dict3[i] = dict2[i]
    for i in s1.intersection(s2):
        dict3[i] = dict1[i] if dict1[i] >= dict2[i] else dict2[i]
    return dict3

设置差异确保取出列表中的差异元素,并且交叉点用于字典之间的公共密钥。

答案 4 :(得分:0)

import operator

def choose_value(key, x, y):
    """Choose a value from either `x` or `y` per the problem requirements."""
    if key not in x:
        return y[key]
    if key not in y:
        return x[key]
    # "The maximum of x[key] and y[key], ordered by their [1] element"
    return max((x[key], y[key]), key=operator.itemgetter(1))

def merge(x, y):
    # "a dict mapping keys to the chosen value, using the union of the keys
    # from x and y as the result keys"
    return {
        key: choose_value(key, x, y)
        for key in x.keys() | y.keys()
    }