比较字典并仅显示Python中的差异?

时间:2018-06-25 07:39:58

标签: python pandas dictionary

我有两个字典,想比较它们并列出差异: 我考虑过要这样做,因为它们是字典,在这里查看其他答案后,这并不容易。另一种方法是将它们变成带有熊猫的数据框?我想考虑顺序也不相同的相同列。因此,应按名称进行检查。

例如,如果'KAEK'在名称数据类型和长度上相同,则在第二个词典中将其列在下方,因为两个词典的顺序不同,因此不应将其视为不同。我该怎么办?

pst.schema

{'properties': OrderedDict([('KAEK', 'str:12'),
              ('PROP_TYPE', 'str:4'),
              ('ORI_TYPE', 'int:1'),
              ('ORI_CODE', 'str:100'),
              ('DEC_ID', 'str:254'),
              ('ADDRESS', 'str:254'),
              ('NUM', 'str:9'),
              ('LEN', 'float:19.11'),
              ('AREA', 'float:19.11')]),
 'geometry': 'Polygon'}


pst2.schema

{'properties': OrderedDict([('OBJECTID_1', 'int:9'),
              ('OBJECTID', 'int:9'),
              ('FID_PERIVL', 'int:9'),
              ('DESC_', 'str:254'),
              ('PROP_TYPE', 'str:4'),
              ('Shape_Leng', 'float:19.11'),
              ('Shape_Le_1', 'float:19.11'),
              ('Shape_Area', 'float:19.11'),
              ('PARCEL_COD', 'str:254'),
              ('KAEK', 'str:50'),
              ('NUM', 'int:4'),
              ('DEC_ID', 'int:4'),
              ('ADDRESS', 'int:4'),
              ('ORI_CODE', 'int:4'),
              ('ORI_TYPE', 'int:4')]),
 'geometry': 'Polygon'}

我正在考虑按如下顺序放置它们:

df = pd.DataFrame(pst2, columns=['NUM', 'DEC_ID','OBJECTID_1'])#place all the columns
#which doesn't work 

但是,如果这样做的话,两个字典之间不同列的任何空白都会造成混乱。 例如,如果first中的列为:

A,B,C

第二个:

A,B,B2,C

将无法正确比较。因此,比较应按名称进行。

总结:比较一下并显示任何组合是否彼此不同。彼此之间不存在的多余列或类似的东西:

'ADDRESS', 'str:254'         #from 1st dictionary
'ADDRESS', 'int:4'           #from 2nd dictionary

试图显示哪个词典所属:

 pprint(set(('d1', el) if el in d1.items() else ('d2', el) for el in d2))


{('d2', 'ADDRESS'),
 ('d2', 'DEC_ID'),
 ('d2', 'DESC_'),
 ('d2', 'FID_PERIVL'),
 ('d2', 'KAEK'),
 ('d2', 'NUM'),
 ('d2', 'OBJECTID'),
 ('d2', 'OBJECTID_1'),
 ('d2', 'ORI_CODE'),
 ('d2', 'ORI_TYPE'),
 ('d2', 'PARCEL_COD'),
 ('d2', 'PROP_TYPE'),
 ('d2', 'Shape_Area'),
 ('d2', 'Shape_Le_1'),
 ('d2', 'Shape_Leng')}

正确的做法是显示两个词典的差异。

1 个答案:

答案 0 :(得分:2)

如果您只想查找两个OrderedDicts之间的对称差异,

from collections import OrderedDict

>>> d1 = {'properties': OrderedDict([('KAEK', 'str:12'),
...               ('PROP_TYPE', 'str:4'),
...               ('ORI_TYPE', 'int:1')...

>>> d1 = d1['properties']

>>> d2 = {'properties': OrderedDict([('OBJECTID_1', 'int:9'),
...               ('OBJECTID', 'int:9'),
...               ('FID_PERIVL', 'int:9')...

>>> d2 = d2['properties']

>>> from pprint import pprint
>>> pprint(d1)
OrderedDict([('KAEK', 'str:12'),
             ('PROP_TYPE', 'str:4'),
             ('ORI_TYPE', 'int:1')...

>>> pprint(d2)
OrderedDict([('OBJECTID_1', 'int:9'),
             ('OBJECTID', 'int:9'),
             ('FID_PERIVL', 'int:9')...

pprint(set.symmetric_difference(set(d1.items()), set(d2.items())))
{('ADDRESS', 'int:4'),
 ('ADDRESS', 'str:254'),
 ('AREA', 'float:19.11'),
 ('DEC_ID', 'int:4'),
 ('DEC_ID', 'str:254'),
 ('DESC_', 'str:254'),
 ('FID_PERIVL', 'int:9'),
 ('KAEK', 'str:12'),
 ('KAEK', 'str:50'),
 ('LEN', 'float:19.11'),
 ('NUM', 'int:4'),
 ('NUM', 'str:9'),
 ('OBJECTID', 'int:9'),
 ('OBJECTID_1', 'int:9'),
 ('ORI_CODE', 'int:4'),
 ('ORI_CODE', 'str:100'),
 ('ORI_TYPE', 'int:1'),
 ('ORI_TYPE', 'int:4'),
 ('PARCEL_COD', 'str:254'),
 ('Shape_Area', 'float:19.11'),
 ('Shape_Le_1', 'float:19.11'),
 ('Shape_Leng', 'float:19.11')}

然后以您想要的任何方式使用结果?

请求进一步编辑OP,

>>> d3 = set.symmetric_difference(set(d1.items()), set(d2.items()))
>>> pprint(set(('d1', el) if el in d1.items() else ('d2', el) for el in d3))
{('d1', ('ADDRESS', 'str:254')),
 ('d1', ('AREA', 'float:19.11')),
 ('d1', ('DEC_ID', 'str:254')),
 ('d1', ('KAEK', 'str:12')),
 ('d1', ('LEN', 'float:19.11')),
 ('d1', ('NUM', 'str:9')),
 ('d1', ('ORI_CODE', 'str:100')),
 ('d1', ('ORI_TYPE', 'int:1')),
 ('d2', ('ADDRESS', 'int:4')),
 ('d2', ('DEC_ID', 'int:4')),
 ('d2', ('DESC_', 'str:254')),
 ('d2', ('FID_PERIVL', 'int:9')),
 ('d2', ('KAEK', 'str:50')),
 ('d2', ('NUM', 'int:4')),
 ('d2', ('OBJECTID', 'int:9')),
 ('d2', ('OBJECTID_1', 'int:9')),
 ('d2', ('ORI_CODE', 'int:4')),
 ('d2', ('ORI_TYPE', 'int:4')),
 ('d2', ('PARCEL_COD', 'str:254')),
 ('d2', ('Shape_Area', 'float:19.11')),
 ('d2', ('Shape_Le_1', 'float:19.11')),
 ('d2', ('Shape_Leng', 'float:19.11'))}