我有两个字典,想比较它们并列出差异: 我考虑过要这样做,因为它们是字典,在这里查看其他答案后,这并不容易。另一种方法是将它们变成带有熊猫的数据框?我想考虑顺序也不相同的相同列。因此,应按名称进行检查。
例如,如果'KAEK'在名称数据类型和长度上相同,则在第二个词典中将其列在下方,因为两个词典的顺序不同,因此不应将其视为不同。我该怎么办?
pst.schema
{'properties': OrderedDict([('KAEK', 'str:12'),
('PROP_TYPE', 'str:4'),
('ORI_TYPE', 'int:1'),
('ORI_CODE', 'str:100'),
('DEC_ID', 'str:254'),
('ADDRESS', 'str:254'),
('NUM', 'str:9'),
('LEN', 'float:19.11'),
('AREA', 'float:19.11')]),
'geometry': 'Polygon'}
pst2.schema
{'properties': OrderedDict([('OBJECTID_1', 'int:9'),
('OBJECTID', 'int:9'),
('FID_PERIVL', 'int:9'),
('DESC_', 'str:254'),
('PROP_TYPE', 'str:4'),
('Shape_Leng', 'float:19.11'),
('Shape_Le_1', 'float:19.11'),
('Shape_Area', 'float:19.11'),
('PARCEL_COD', 'str:254'),
('KAEK', 'str:50'),
('NUM', 'int:4'),
('DEC_ID', 'int:4'),
('ADDRESS', 'int:4'),
('ORI_CODE', 'int:4'),
('ORI_TYPE', 'int:4')]),
'geometry': 'Polygon'}
我正在考虑按如下顺序放置它们:
df = pd.DataFrame(pst2, columns=['NUM', 'DEC_ID','OBJECTID_1'])#place all the columns
#which doesn't work
但是,如果这样做的话,两个字典之间不同列的任何空白都会造成混乱。 例如,如果first中的列为:
A,B,C
第二个:
A,B,B2,C
将无法正确比较。因此,比较应按名称进行。
总结:比较一下并显示任何组合是否彼此不同。彼此之间不存在的多余列或类似的东西:
'ADDRESS', 'str:254' #from 1st dictionary
'ADDRESS', 'int:4' #from 2nd dictionary
试图显示哪个词典所属:
pprint(set(('d1', el) if el in d1.items() else ('d2', el) for el in d2))
{('d2', 'ADDRESS'),
('d2', 'DEC_ID'),
('d2', 'DESC_'),
('d2', 'FID_PERIVL'),
('d2', 'KAEK'),
('d2', 'NUM'),
('d2', 'OBJECTID'),
('d2', 'OBJECTID_1'),
('d2', 'ORI_CODE'),
('d2', 'ORI_TYPE'),
('d2', 'PARCEL_COD'),
('d2', 'PROP_TYPE'),
('d2', 'Shape_Area'),
('d2', 'Shape_Le_1'),
('d2', 'Shape_Leng')}
正确的做法是显示两个词典的差异。
答案 0 :(得分:2)
如果您只想查找两个OrderedDicts之间的对称差异,
from collections import OrderedDict
>>> d1 = {'properties': OrderedDict([('KAEK', 'str:12'),
... ('PROP_TYPE', 'str:4'),
... ('ORI_TYPE', 'int:1')...
>>> d1 = d1['properties']
>>> d2 = {'properties': OrderedDict([('OBJECTID_1', 'int:9'),
... ('OBJECTID', 'int:9'),
... ('FID_PERIVL', 'int:9')...
>>> d2 = d2['properties']
>>> from pprint import pprint
>>> pprint(d1)
OrderedDict([('KAEK', 'str:12'),
('PROP_TYPE', 'str:4'),
('ORI_TYPE', 'int:1')...
>>> pprint(d2)
OrderedDict([('OBJECTID_1', 'int:9'),
('OBJECTID', 'int:9'),
('FID_PERIVL', 'int:9')...
pprint(set.symmetric_difference(set(d1.items()), set(d2.items())))
{('ADDRESS', 'int:4'),
('ADDRESS', 'str:254'),
('AREA', 'float:19.11'),
('DEC_ID', 'int:4'),
('DEC_ID', 'str:254'),
('DESC_', 'str:254'),
('FID_PERIVL', 'int:9'),
('KAEK', 'str:12'),
('KAEK', 'str:50'),
('LEN', 'float:19.11'),
('NUM', 'int:4'),
('NUM', 'str:9'),
('OBJECTID', 'int:9'),
('OBJECTID_1', 'int:9'),
('ORI_CODE', 'int:4'),
('ORI_CODE', 'str:100'),
('ORI_TYPE', 'int:1'),
('ORI_TYPE', 'int:4'),
('PARCEL_COD', 'str:254'),
('Shape_Area', 'float:19.11'),
('Shape_Le_1', 'float:19.11'),
('Shape_Leng', 'float:19.11')}
然后以您想要的任何方式使用结果?
请求进一步编辑OP,
>>> d3 = set.symmetric_difference(set(d1.items()), set(d2.items()))
>>> pprint(set(('d1', el) if el in d1.items() else ('d2', el) for el in d3))
{('d1', ('ADDRESS', 'str:254')),
('d1', ('AREA', 'float:19.11')),
('d1', ('DEC_ID', 'str:254')),
('d1', ('KAEK', 'str:12')),
('d1', ('LEN', 'float:19.11')),
('d1', ('NUM', 'str:9')),
('d1', ('ORI_CODE', 'str:100')),
('d1', ('ORI_TYPE', 'int:1')),
('d2', ('ADDRESS', 'int:4')),
('d2', ('DEC_ID', 'int:4')),
('d2', ('DESC_', 'str:254')),
('d2', ('FID_PERIVL', 'int:9')),
('d2', ('KAEK', 'str:50')),
('d2', ('NUM', 'int:4')),
('d2', ('OBJECTID', 'int:9')),
('d2', ('OBJECTID_1', 'int:9')),
('d2', ('ORI_CODE', 'int:4')),
('d2', ('ORI_TYPE', 'int:4')),
('d2', ('PARCEL_COD', 'str:254')),
('d2', ('Shape_Area', 'float:19.11')),
('d2', ('Shape_Le_1', 'float:19.11')),
('d2', ('Shape_Leng', 'float:19.11'))}