假设我们有两个Pandas DataFrame,如下所示:
df1 = pd.DataFrame({'id': ['a', 'b', 'c']})
df1
id
0 a
1 b
2 c
df2 = pd.DataFrame({'ids': [['b','c'], ['a', 'b'], ['a', 'z']],
'info': ['asdf', 'zxcv', 'sdfg']})
df2
ids info
0 [b, c] asdf
1 [a, b] zxcv
2 [a, z] sdfg
如何将df1
与df2
位于df1.id
中的df2.ids
行合并/合并?
换句话说,我如何实现以下目标:
df3
id ids info
0 a [a, b] asdf
1 a [a, z] sdfg
2 b [b, c] asdf
3 b [a, b] zxcv
4 c [b, c] asdf
还有在id
上聚合的上述版本,如下所示:
df3
id ids info
0 a [[a, b], [a, z]] [asdf, sdfg]
2 b [[a, b], [b, c]] [asdf, zxcv]
3 c [[b, c]] [asdf]
我尝试了以下操作:
df1.merge(df2, how = 'left', left_on = 'id', right_on = 'ids')
TypeError: unhashable type: 'list'
df1.id.isin(df2.ids)
TypeError: unhashable type: 'list'
答案 0 :(得分:2)
df = df2.set_index('info').ids.apply(pd.Series)\
.stack().reset_index(0, name='id').merge(df2)\
.merge(df1, how='right').sort_values('id')\
.reset_index(drop=True)
print(df)
info id ids
0 zxcv a [a, b]
1 sdfg a [a, z]
2 asdf b [b, c]
3 zxcv b [a, b]
4 asdf c [b, c]
用于汇总使用:
df = df.groupby('id', as_index=False).agg(list)
print(df)
id info ids
0 a [zxcv, sdfg] [[a, b], [a, z]]
1 b [asdf, zxcv] [[b, c], [a, b]]
2 c [asdf] [[b, c]]
答案 1 :(得分:0)
使用-
df2[['id1','id2']] = pd.DataFrame(df2.ids.values.tolist(), index= df2.index)
new_df1 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id1'])
new_df2 = pd.merge(df1, df2, how='inner', left_on=['id'], right_on = ['id2'])
new_df = new_df1.append(new_df2)[['id','ids','info']]
输出
id ids info
0 a [a, b] zxcv
1 a [a, z] sdfg
2 b [b, c] asdf
0 b [a, b] zxcv
1 c [b, c] asdf
聚合部分
new_df.groupby('id')['ids', 'info'].agg(lambda x: list(x))
输出
ids info
id
a [[a, b], [a, z]] [zxcv, sdfg]
b [[b, c], [a, b]] [asdf, zxcv]
c [[b, c]] [asdf]