假设我有两个Python词典 - dictA
和dictB
。我需要查明dictB
中是否存在任何密钥,而dictA
中是否存在密钥。什么是最快的方法呢?
我应该将字典键转换成一个集合,然后再去吗?
有兴趣了解你的想法......
感谢您的回复。
抱歉没有正确陈述我的问题。
我的情况是这样的 - 我有一个dictA
可能与dictB
相同,或者与dictB
相比可能会丢失一些密钥,否则某些密钥的值可能会有所不同必须设置为dictA
键的值。
问题是字典没有标准,并且可以包含可以作为dict字典的值。
说
dictA={'key1':a, 'key2':b, 'key3':{'key11':cc, 'key12':dd}, 'key4':{'key111':{....}}}
dictB={'key1':a, 'key2:':newb, 'key3':{'key11':cc, 'key12':newdd, 'key13':ee}.......
因此'key2'值必须重置为新值,并且必须在dict中添加'key13'。 键值没有固定格式。它可以是一个简单的价值,也可以是字典或dict的字典。
答案 0 :(得分:229)
您可以在键上使用设置操作:
diff = set(dictb.keys()) - set(dicta.keys())
这是一个找到所有可能性的类:添加了什么,删除了什么,哪些键值对是相同的,以及哪些键值对被更改。
class DictDiffer(object):
"""
Calculate the difference between two dictionaries as:
(1) items added
(2) items removed
(3) keys same in both but changed values
(4) keys same in both and unchanged values
"""
def __init__(self, current_dict, past_dict):
self.current_dict, self.past_dict = current_dict, past_dict
self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
self.intersect = self.set_current.intersection(self.set_past)
def added(self):
return self.set_current - self.intersect
def removed(self):
return self.set_past - self.intersect
def changed(self):
return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
def unchanged(self):
return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])
以下是一些示例输出:
>>> a = {'a': 1, 'b': 1, 'c': 0}
>>> b = {'a': 1, 'b': 2, 'd': 0}
>>> d = DictDiffer(b, a)
>>> print "Added:", d.added()
Added: set(['d'])
>>> print "Removed:", d.removed()
Removed: set(['c'])
>>> print "Changed:", d.changed()
Changed: set(['b'])
>>> print "Unchanged:", d.unchanged()
Unchanged: set(['a'])
作为github回购提供: https://github.com/hughdbrown/dictdiffer
答案 1 :(得分:53)
如果你想以递归方式获得差异,我已经为python编写了一个包: https://github.com/seperman/deepdiff
从PyPi安装:
pip install deepdiff
导入
>>> from deepdiff import DeepDiff
>>> from pprint import pprint
>>> from __future__ import print_function # In case running on Python 2
相同的对象返回空
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = t1
>>> print(DeepDiff(t1, t2))
{}
项目类型已更改
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:"2", 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{ 'type_changes': { 'root[2]': { 'newtype': <class 'str'>,
'newvalue': '2',
'oldtype': <class 'int'>,
'oldvalue': 2}}}
项目的价值已更改
>>> t1 = {1:1, 2:2, 3:3}
>>> t2 = {1:1, 2:4, 3:3}
>>> pprint(DeepDiff(t1, t2), indent=2)
{'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
添加和/或删除了项目
>>> t1 = {1:1, 2:2, 3:3, 4:4}
>>> t2 = {1:1, 2:4, 3:3, 5:5, 6:6}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff)
{'dic_item_added': ['root[5]', 'root[6]'],
'dic_item_removed': ['root[4]'],
'values_changed': {'root[2]': {'newvalue': 4, 'oldvalue': 2}}}
字符串差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world"}}
>>> t2 = {1:1, 2:4, 3:3, 4:{"a":"hello", "b":"world!"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { 'root[2]': {'newvalue': 4, 'oldvalue': 2},
"root[4]['b']": { 'newvalue': 'world!',
'oldvalue': 'world'}}}
字符串差异2
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world!\nGoodbye!\n1\n2\nEnd"}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n1\n2\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'values_changed': { "root[4]['b']": { 'diff': '--- \n'
'+++ \n'
'@@ -1,5 +1,4 @@\n'
'-world!\n'
'-Goodbye!\n'
'+world\n'
' 1\n'
' 2\n'
' End',
'newvalue': 'world\n1\n2\nEnd',
'oldvalue': 'world!\n'
'Goodbye!\n'
'1\n'
'2\n'
'End'}}}
>>>
>>> print (ddiff['values_changed']["root[4]['b']"]["diff"])
---
+++
@@ -1,5 +1,4 @@
-world!
-Goodbye!
+world
1
2
End
输入类型
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":"world\n\n\nEnd"}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'type_changes': { "root[4]['b']": { 'newtype': <class 'str'>,
'newvalue': 'world\n\n\nEnd',
'oldtype': <class 'list'>,
'oldvalue': [1, 2, 3]}}}
列表差异
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3, 4]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{'iterable_item_removed': {"root[4]['b'][2]": 3, "root[4]['b'][3]": 4}}
列表差异2:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'iterable_item_added': {"root[4]['b'][3]": 3},
'values_changed': { "root[4]['b'][1]": {'newvalue': 3, 'oldvalue': 2},
"root[4]['b'][2]": {'newvalue': 2, 'oldvalue': 3}}}
列出忽略顺序或重复的差异:(使用与上面相同的词典)
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, 3]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 3, 2, 3]}}
>>> ddiff = DeepDiff(t1, t2, ignore_order=True)
>>> print (ddiff)
{}
包含字典的列表:
>>> t1 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:1, 2:2}]}}
>>> t2 = {1:1, 2:2, 3:3, 4:{"a":"hello", "b":[1, 2, {1:3}]}}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (ddiff, indent = 2)
{ 'dic_item_removed': ["root[4]['b'][2][2]"],
'values_changed': {"root[4]['b'][2][1]": {'newvalue': 3, 'oldvalue': 1}}}
设定:
>>> t1 = {1, 2, 8}
>>> t2 = {1, 2, 3, 5}
>>> ddiff = DeepDiff(t1, t2)
>>> pprint (DeepDiff(t1, t2))
{'set_item_added': ['root[3]', 'root[5]'], 'set_item_removed': ['root[8]']}
命名元组:
>>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> t1 = Point(x=11, y=22)
>>> t2 = Point(x=11, y=23)
>>> pprint (DeepDiff(t1, t2))
{'values_changed': {'root.y': {'newvalue': 23, 'oldvalue': 22}}}
自定义对象:
>>> class ClassA(object):
... a = 1
... def __init__(self, b):
... self.b = b
...
>>> t1 = ClassA(1)
>>> t2 = ClassA(2)
>>>
>>> pprint(DeepDiff(t1, t2))
{'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
添加了对象属性:
>>> t2.c = "new attribute"
>>> pprint(DeepDiff(t1, t2))
{'attribute_added': ['root.c'],
'values_changed': {'root.b': {'newvalue': 2, 'oldvalue': 1}}}
答案 2 :(得分:18)
不确定它是否“快”,但通常可以做到这一点
dicta = {"a":1,"b":2,"c":3,"d":4}
dictb = {"a":1,"d":2}
for key in dicta.keys():
if not key in dictb:
print key
答案 3 :(得分:14)
正如Alex Martelli写的那样,如果您只是想检查B中的任何键是否不在A中,any(True for k in dictB if k not in dictA)
就可以了。
找到丢失的密钥:
diff = set(dictB)-set(dictA) #sets
C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=set(dictB)-set(dictA)"
10000 loops, best of 3: 107 usec per loop
diff = [ k for k in dictB if k not in dictA ] #lc
C:\Dokumente und Einstellungen\thc>python -m timeit -s "dictA =
dict(zip(range(1000),range
(1000))); dictB = dict(zip(range(0,2000,2),range(1000)))" "diff=[ k for k in dictB if
k not in dictA ]"
10000 loops, best of 3: 95.9 usec per loop
所以这两种解决方案的速度几乎相同。
答案 4 :(得分:12)
如果你真正的意思是你所说的(你只需要找出中间的“有任何钥匙”而不是A中,而不是那些可能是那些),最快的方法应该是:< / p>
if any(True for k in dictB if k not in dictA): ...
如果你真的需要找出哪个键,如果有,在B中而不在A中,而不只是“IF”有这样的键,那么现有的答案是非常合适的(但我确实建议将来更精确)如果这确实是你的意思,请提出问题; - )。
答案 5 :(得分:7)
set(dictA.keys()).intersection(dictB.keys())
答案 6 :(得分:5)
The top answer by hughdbrown建议使用set difference,这绝对是最好的方法:
diff = set(dictb.keys()) - set(dicta.keys())
这段代码的问题在于它构建两个列表只是为了创建两个集合,因此它浪费了4N时间和2N空间。它也比它需要的复杂一点。
通常,这不是什么大问题,但如果是:
diff = dictb.keys() - dicta
collections.abc.Mapping
has a KeysView
that acts like a Set
。在Python 2中,keys()
返回键的列表,而不是KeysView
。所以你必须直接询问viewkeys()
。
diff = dictb.viewkeys() - dicta
对于双版本2.7 / 3.x代码,您希望使用six
或类似代码,以便使用six.viewkeys(dictb)
:
diff = six.viewkeys(dictb) - dicta
在2.4-2.6中,没有KeysView
。但是你可以通过直接从迭代器中构建你的左集来减少4N到N的成本,而不是先建立一个列表:
diff = set(dictb) - dicta
我有一个与dictB相同的dictA,或者与dictB相比可能会丢失一些键,否则某些键的值可能会有所不同
所以你真的不需要比较关键,但是项目。如果值是可清除的,ItemsView
只是Set
,就像字符串一样。如果是,那很简单:
diff = dictb.items() - dicta.items()
虽然问题不是直接要求递归diff,但是一些示例值是dicts,并且看起来预期的输出会递归地区分它们。这里已经有多个答案显示了如何做到这一点。
答案 7 :(得分:5)
还有一个question in stackoverflow about this argument我不得不承认有一个简单的解决方案:python的datadiff library有助于打印两个词典之间的差异。
答案 8 :(得分:3)
这是一种可行的方法,允许评估为False
的密钥,并且如果可能的话仍然使用生成器表达式尽早退出。但它并不是特别漂亮。
any(map(lambda x: True, (k for k in b if k not in a)))
修改强>
THC4k在另一个答案中回复了我的评论。以上是一种更好,更漂亮的方法:
any(True for k in b if k not in a)
不确定这是怎么回事......
答案 9 :(得分:3)
这是一个老问题,并且比我需要的要少一点,所以这个答案实际上解决的问题不仅仅是这个问题。这个问题的答案帮助我解决了以下问题:
所有这些与JSON相结合,可以提供非常强大的配置存储支持。
解决方案(also on github):
from collections import OrderedDict
from pprint import pprint
class izipDestinationMatching(object):
__slots__ = ("attr", "value", "index")
def __init__(self, attr, value, index):
self.attr, self.value, self.index = attr, value, index
def __repr__(self):
return "izip_destination_matching: found match by '%s' = '%s' @ %d" % (self.attr, self.value, self.index)
def izip_destination(a, b, attrs, addMarker=True):
"""
Returns zipped lists, but final size is equal to b with (if shorter) a padded with nulls
Additionally also tries to find item reallocations by searching child dicts (if they are dicts) for attribute, listed in attrs)
When addMarker == False (patching), final size will be the longer of a, b
"""
for idx, item in enumerate(b):
try:
attr = next((x for x in attrs if x in item), None) # See if the item has any of the ID attributes
match, matchIdx = next(((orgItm, idx) for idx, orgItm in enumerate(a) if attr in orgItm and orgItm[attr] == item[attr]), (None, None)) if attr else (None, None)
if match and matchIdx != idx and addMarker: item[izipDestinationMatching] = izipDestinationMatching(attr, item[attr], matchIdx)
except:
match = None
yield (match if match else a[idx] if len(a) > idx else None), item
if not addMarker and len(a) > len(b):
for item in a[len(b) - len(a):]:
yield item, item
def dictdiff(a, b, searchAttrs=[]):
"""
returns a dictionary which represents difference from a to b
the return dict is as short as possible:
equal items are removed
added / changed items are listed
removed items are listed with value=None
Also processes list values where the resulting list size will match that of b.
It can also search said list items (that are dicts) for identity values to detect changed positions.
In case such identity value is found, it is kept so that it can be re-found during the merge phase
@param a: original dict
@param b: new dict
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictdiff(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs)]
return b
res = OrderedDict()
if izipDestinationMatching in b:
keepKey = b[izipDestinationMatching].attr
del b[izipDestinationMatching]
else:
keepKey = izipDestinationMatching
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
if keepKey == key or v1 != v2: res[key] = dictdiff(v1, v2, searchAttrs)
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res
def dictmerge(a, b, searchAttrs=[]):
"""
Returns a dictionary which merges differences recorded in b to base dictionary a
Also processes list values where the resulting list size will match that of a
It can also search said list items (that are dicts) for identity values to detect changed positions
@param a: original dict
@param b: diff dict to patch into a
@param searchAttrs: list of strings (keys to search for in sub-dicts)
@return: dict / list / whatever input is
"""
if not (isinstance(a, dict) and isinstance(b, dict)):
if isinstance(a, list) and isinstance(b, list):
return [dictmerge(v1, v2, searchAttrs) for v1, v2 in izip_destination(a, b, searchAttrs, False)]
return b
res = OrderedDict()
for key in sorted(set(a.keys() + b.keys())):
v1 = a.get(key, None)
v2 = b.get(key, None)
#print "processing", key, v1, v2, key not in b, dictmerge(v1, v2)
if v2 is not None: res[key] = dictmerge(v1, v2, searchAttrs)
elif key not in b: res[key] = v1
if len(res) <= 1: res = dict(res) # This is only here for pretty print (OrderedDict doesn't pprint nicely)
return res
答案 10 :(得分:2)
如果在Python≥2.7:
# update different values in dictB
# I would assume only dictA should be updated,
# but the question specifies otherwise
for k in dictA.viewkeys() & dictB.viewkeys():
if dictA[k] != dictB[k]:
dictB[k]= dictA[k]
# add missing keys to dictA
dictA.update( (k,dictB[k]) for k in dictB.viewkeys() - dictA.viewkeys() )
答案 11 :(得分:2)
怎么样的标准(比较完整对象)
PyDev-&gt;新的PyDev模块 - &gt;模块:unittest
import unittest
class Test(unittest.TestCase):
def testName(self):
obj1 = {1:1, 2:2}
obj2 = {1:1, 2:2}
self.maxDiff = None # sometimes is usefull
self.assertDictEqual(d1, d2)
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
答案 12 :(得分:1)
以下是深入比较2个词典键的解决方案:
def compareDictKeys(dict1, dict2):
if type(dict1) != dict or type(dict2) != dict:
return False
keys1, keys2 = dict1.keys(), dict2.keys()
diff = set(keys1) - set(keys2) or set(keys2) - set(keys1)
if not diff:
for key in keys1:
if (type(dict1[key]) == dict or type(dict2[key]) == dict) and not compareDictKeys(dict1[key], dict2[key]):
diff = True
break
return not diff
答案 13 :(得分:1)
这是一个可以比较两个以上的决策的解决方案:
def diff_dict(dicts, default=None):
diff_dict = {}
# add 'list()' around 'd.keys()' for python 3 compatibility
for k in set(sum([d.keys() for d in dicts], [])):
# we can just use "values = [d.get(k, default) ..." below if
# we don't care that d1[k]=default and d2[k]=missing will
# be treated as equal
if any(k not in d for d in dicts):
diff_dict[k] = [d.get(k, default) for d in dicts]
else:
values = [d[k] for d in dicts]
if any(v != values[0] for v in values):
diff_dict[k] = values
return diff_dict
用法示例:
import matplotlib.pyplot as plt
diff_dict([plt.rcParams, plt.rcParamsDefault, plt.matplotlib.rcParamsOrig])
答案 14 :(得分:1)
我的两个词典之间的对称差异的配方:
param dict1 dict2
1 a b
2 b a
5 e N\A
6 N\A f
结果是:
{{1}}
答案 15 :(得分:1)
正如其他答案中所提到的,unittest为比较dicts产生了一些不错的输出,但在这个例子中我们不想首先构建一个完整的测试。
刮掉unittest来源,看起来你可以得到一个公平的解决方案:
import difflib
import pprint
def diff_dicts(a, b):
if a == b:
return ''
return '\n'.join(
difflib.ndiff(pprint.pformat(a, width=30).splitlines(),
pprint.pformat(b, width=30).splitlines())
)
所以
dictA = dict(zip(range(7), map(ord, 'python')))
dictB = {0: 112, 1: 'spam', 2: [1,2,3], 3: 104, 4: 111}
print diff_dicts(dictA, dictB)
结果:
{0: 112,
- 1: 121,
- 2: 116,
+ 1: 'spam',
+ 2: [1, 2, 3],
3: 104,
- 4: 111,
? ^
+ 4: 111}
? ^
- 5: 110}
其中:
与unittest类似,唯一需要注意的是,由于尾随逗号/括号,最终映射可以被认为是差异。
答案 16 :(得分:1)
@Maxx有一个很好的答案,使用Python提供的unittest
工具:
import unittest
class Test(unittest.TestCase):
def runTest(self):
pass
def testDict(self, d1, d2, maxDiff=None):
self.maxDiff = maxDiff
self.assertDictEqual(d1, d2)
然后,您可以在代码中的任何位置调用:
try:
Test().testDict(dict1, dict2)
except Exception, e:
print e
结果输出看起来像diff
的输出,用+
或-
在每个不同的行前面打印字典。
答案 17 :(得分:0)
如果你想要一个内置的解决方案来与任意字典结构进行完全比较,@ Maxx的答案是一个良好的开端。
import unittest
test = unittest.TestCase()
test.assertEqual(dictA, dictB)
答案 18 :(得分:0)
基于ghostdog74的回答,
dicta = {"a":1,"d":2}
dictb = {"a":5,"d":2}
for value in dicta.values():
if not value in dictb.values():
print value
将打印不同的dicta值
答案 19 :(得分:0)
尝试这样找到de intersection,这两个词都在dictionarie中,如果你想在第二个词典中找不到键,只需使用不在 ...
intersect = filter(lambda x, dictB=dictB.keys(): x in dictB, dictA.keys())
答案 20 :(得分:0)
不确定它是否仍然相关但是我遇到了这个问题,我的情况我只需要返回所有嵌套词典等变化的字典。无法找到一个好的解决方案,但我确实结束了{ {3}}。希望这会有所帮助,