我有三个Pandas数据框,df1
,df2,
和df3
,如下所示:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id' : ['one', 'two', 'three'], 'score': [56, 45, 78]})
df2 = pd.DataFrame({'id' : ['one', 'five', 'four'], 'score': [35, 81, 90]})
df3 = pd.DataFrame({'id' : ['five', 'two', 'six'], 'score': [23, 66, 42]})
如何基于id
加入这些数据框,然后将它们的列连接在一起?所需的输出如下:
#join_and_concatenate by id:
id score(df1) score(df2) score(df3)
one 56 35 NaN
two 45 NaN 66
three 78 NaN NaN
four NaN 90 NaN
five NaN 81 23
six NaN NaN 42
我找到了一个相关的page来讨论merge()
,concatenate()
和join()
,但我不确定这些是否符合我的要求。
答案 0 :(得分:4)
使用concat
可能有更好的方法,但这应该有效:
In [48]: pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')
Out[48]:
id score_x score_y score
0 one 56 35 NaN
1 two 45 NaN 66
2 three 78 NaN NaN
3 five NaN 81 23
4 four NaN 90 NaN
5 six NaN NaN 42
[6 rows x 4 columns]
获得您想要的答案:
In [54]: merged = pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')
In [55]: merged.set_index('id').rename(columns={'score_x': 'score(df1)', 'score_y': 'score(df2)
', 'score': 'score(df3)'})
Out[55]:
score(df1) score(df2) score(df3)
id
one 56 35 NaN
two 45 NaN 66
three 78 NaN NaN
five NaN 81 23
four NaN 90 NaN
six NaN NaN 42
[6 rows x 3 columns]