如何基于具有不同字典的公共列比较和合并两个数据框?
我有以下两个数据框,
df1 = pd.DataFrame({'name':['tom','keith','sam','joe'],'assets':[{'laptop':1,'scanner':2},{'laptop':1,'printer':3}, {'car':12,'keys':34},{'power-cables':24}]})
df2 = pd.DataFrame({'place':['ca','bal-vm'],'default_assets':[{'laptop':4,'printer':3,'scanner':2,'bag':8},{'car':12,'keys':34,'mat':24,'holder':45}]})
df1:
name assets
0 tom {'laptop':1,'scanner':2}
1 keith {'laptop':1,'printer':3}
2 sam {'car':12,'keys':34}
3 joe {'power-cables':24}
df2:
place default_assets
0 ca {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 bal-vm {'car':12,'keys':34,'mat':24,'holder':45}
当df2
的所有键都位于df1
中时, df1.assets
应该与df2.default_assets
合并,否则应该填充None
。
因此,结果df
应该是
df:
name place assets default_assets
0 tom ca {'laptop':1,'scanner':2} {'laptop':4,'printer':3,'scanner':2,'bag':8}
1 keith ca {'laptop':1,'printer':3} {'laptop':4,'printer':3,'scanner':2,'bag':8}
2 sam bal-vm {'car':12,'keys':34} {'car':12,'keys':34,'mat':24,'holder':45}
3 joe None {'power-cables':24} None
答案 0 :(得分:2)
您可以执行以下操作:
df1.assets
的所有键都不在df2.default_assets
中的行。例如:
# cross join
merged = df1.assign(key=1).merge(df2.assign(key=1), on='key').drop('key', axis=1)
# mask to filter
mask = [asset.keys() < default.keys() for asset, default in zip(merged['assets'], merged['default_assets'])]
# add those not in the mask
result = pd.concat([merged.loc[mask], df1], sort=True).drop_duplicates('name')
# print in full
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(result)
输出
assets \
0 {'laptop': 1, 'scanner': 2}
2 {'laptop': 1, 'printer': 3}
5 {'car': 12, 'keys': 34}
3 {'power-cables': 24}
default_assets name place
0 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... tom ca
2 {'laptop': 4, 'printer': 3, 'scanner': 2, 'bag... keith ca
5 {'car': 12, 'keys': 34, 'mat': 24, 'holder': 45} sam bal-vm
3 NaN joe NaN