我们如何将两个数据框与具有嵌套字典的列合并。在“操作”列中使用 df2 更新 df1。有没有办法通过使用 concat、append 和 merge 等可用方法来实现这一点?
df1 = pd.DataFrame([
{
"id": "87c4b5a0db9f49c49f766436c9582297",
"actions": {
"sample": [
{
"tagvalue": "test",
"status": "created"
},
{
"tagvalue": "test2",
"status": "created"
}
]
}
},
{
"id": "87c4b5a0db9f49c49f766436c9582298",
"actions": {
"sample": [
{
"tagvalue": "test2",
"status": "created"
}
]
}
}
])
df2 = pd.DataFrame([
{
"id": "87c4b5a0db9f49c49f766436c9582297",
"actions": {
"sample": [
{
"tagvalue": "test",
"status": "updated"
}
]
}
}
])
df1.set_index('id', inplace=True)
df2.set_index('id', inplace=True)
# Need to merge the data based on id
# TODO : Right way to merge to get the following output
finalOutputExpectaion = [
{
"id": "87c4b5a0db9f49c49f766436c9582297",
"actions": {
"sample": [
{
"tagvalue": "test",
"status": "updated"
},
{
"tagvalue": "test2",
"status": "created"
}
]
}
},
{
"id": "87c4b5a0db9f49c49f766436c9582298",
"actions": {
"sample": [
{
"tagvalue": "test2",
"status": "created"
}
]
}
}
]
注意:finalOutputExpectaion-将数据帧更新为dict(我们将通过使用to_dict(orient=records)来获取它) Python版本:3.7, 熊猫版本:1.1.0
答案 0 :(得分:0)
首先 join
df1
上的数据框 df2
和 id
,然后在列表推导式中 zip
列actions
从左到右数据框并使用自定义的 merge
函数来更新字典:
def merge(d1, d2):
if pd.isna(d1) or pd.isna(d2):
return d1
tags = set(d['tagvalue'] for d in d2['sample'])
d2['sample'] += [d for d in d1['sample'] if d['tagvalue'] not in tags]
return d2
m = df1.join(df2, lsuffix='', rsuffix='_r')
df1['actions'] = [merge(*v) for v in zip(m['actions'], m['actions_r'])]
结果:
actions
id
87c4b5a0db9f49c49f766436c9582297 {'sample': [{'tagvalue': 'test', 'status': 'updated'}, {'tagvalue': 'test2', 'status': 'created'}]}
87c4b5a0db9f49c49f766436c9582298 {'sample': [{'tagvalue': 'test2', 'status': 'created'}]}