如何在python中有效地找到两个字典之间的所有差异

时间:2018-02-09 16:42:29

标签: python dictionary

所以,我有2个词典,我必须检查缺少的键和匹配键,检查它们是否具有相同或不同的值。

dict1 = {..}
dict2 = {..}
#key values in a list that are missing in each
missing_in_dict1_but_in_dict2 = []
missing_in_dict2_but_in_dict1 = []
#key values in a list that are mismatched between the 2 dictionaries
mismatch = []

最有效的方法是什么?

2 个答案:

答案 0 :(得分:3)

您可以使用dictionary view objects作为。减去集合以获得差异:

missing_in_dict1_but_in_dict2 = dict2.keys() - dict1
missing_in_dict2_but_in_dict1 = dict1.keys() - dict2

对于相同的键,使用交叉点和&运算符:

mismatch = {key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]}

如果您仍在使用Python 2,请使用dict.viewkeys()

使用字典视图生成交集和差异非常有效,视图对象本身非常轻量级从集合操作创建新集合的算法可以直接使用底层字典的O(1)查找行为。

演示:

>>> dict1 = {'foo': 42, 'bar': 81}
>>> dict2 = {'bar': 117, 'spam': 'ham'}
>>> dict2.keys() - dict1
{'spam'}
>>> dict1.keys() - dict2
{'foo'}
>>> [key for key in dict1.keys() & dict2 if dict1[key] != dict2[key]]
{'bar'}

与创建单独的set()对象的性能比较:

>>> import timeit
>>> import random
>>> def difference_views(d1, d2):
...     missing1 = d2.keys() - d1
...     missing2 = d1.keys() - d2
...     mismatch = {k for k in d1.keys() & d2 if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> def difference_sets(d1, d2):
...     missing1 = set(d2) - set(d1)
...     missing2 = set(d1) - set(d2)
...     mismatch = {k for k in set(d1) & set(d2) if d1[k] != d2[k]}
...     return missing1, missing2, mismatch
...
>>> testd1 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> testd2 = {random.randrange(1000000): random.randrange(1000000) for _ in range(10000)}
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_views as d', number=1000)
1.8643521590274759
>>> timeit.timeit('d(d1, d2)', 'from __main__ import testd1 as d1, testd2 as d2, difference_sets as d', number=1000)
2.811345119960606

使用set()个对象的速度较慢,尤其是当您的输入词典变大时。

答案 1 :(得分:2)

一种简单的方法是从dict键创建集合并减去它们:

>>> dict1 = { 'a': 1, 'b': 1 }
>>> dict2 = { 'b': 1, 'c': 1 }
>>> missing_in_dict1_but_in_dict2 = set(dict2) - set(dict1)
>>> missing_in_dict1_but_in_dict2
set(['c'])
>>> missing_in_dict2_but_in_dict1 = set(dict1) - set(dict2)
>>> missing_in_dict2_but_in_dict1
set(['a'])

或者您可以使用dict避免将第二个set投射到.difference()

>>> set(dict1).difference(dict2)
set(['a'])
>>> set(dict2).difference(dict1)
set(['c'])