Python Pandas:加入唯一的列值并连接

时间:2014-01-07 15:29:50

标签: python join merge pandas concatenation

我有三个Pandas数据框,df1df2,df3,如下所示:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id' : ['one', 'two', 'three'], 'score': [56, 45, 78]})
df2 = pd.DataFrame({'id' : ['one', 'five', 'four'], 'score': [35, 81, 90]})
df3 = pd.DataFrame({'id' : ['five', 'two', 'six'], 'score': [23, 66, 42]})

如何基于id加入这些数据框,然后将它们的列连接在一起?所需的输出如下:

#join_and_concatenate by id:

id   score(df1)  score(df2)  score(df3)
one    56            35         NaN
two    45            NaN        66
three  78            NaN        NaN
four   NaN           90         NaN
five   NaN           81         23
six    NaN           NaN        42

我找到了一个相关的page来讨论merge()concatenate()join(),但我不确定这些是否符合我的要求。

1 个答案:

答案 0 :(得分:4)

使用concat可能有更好的方法,但这应该有效:

In [48]: pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')
Out[48]: 
      id  score_x  score_y  score
0    one       56       35    NaN
1    two       45      NaN     66
2  three       78      NaN    NaN
3   five      NaN       81     23
4   four      NaN       90    NaN
5    six      NaN      NaN     42

[6 rows x 4 columns]

获得您想要的答案:

In [54]: merged = pd.merge(df1, df2, how='outer', on='id').merge(df3, how='outer', on='id')

In [55]: merged.set_index('id').rename(columns={'score_x': 'score(df1)', 'score_y': 'score(df2)
', 'score': 'score(df3)'})
Out[55]: 
       score(df1)  score(df2)  score(df3)
id                                       
one            56          35         NaN
two            45         NaN          66
three          78         NaN         NaN
five          NaN          81          23
four          NaN          90         NaN
six           NaN         NaN          42

[6 rows x 3 columns]