合并两个具有相同列

时间:2017-02-07 12:22:15

标签: python pandas dataframe merge

我想合并df_1和df_2来创建df_merged,但我想合并两个共同的列,而不是创建A_x和A_y之类的列。

index = [np.array(['foo', 'foo', 'qux', 'qux']),
         np.array(['one', 'two', 'one', 'two',])]
columns = ["A",  "B"]
df_1 = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)

index = [np.array(['bar', 'bar', 'baz', 'baz',]),
         np.array(['one', 'two', 'one', 'two',])]
columns = ["A",  "B"]
df_2 = pd.DataFrame(np.random.randn(4, 2), index=index, columns=columns)

df_merge = pd.merge(df_1, df_2, left_index=True, right_index=True, how='outer')

print df_1
print df_2
print df_merge

df_1

                A         B
foo one  2.082229  1.575985
    two -0.805592  0.444195
qux one  0.368874  0.253556
    two  1.017632 -0.471978

df_2

                A         B
bar one  0.134571  0.415209
    two -1.288889 -0.144284
baz one -0.117345 -0.095292
    two -0.256708 -0.682542

df_merge - 当前输出

              A_x       B_x       A_y       B_y
bar one       NaN       NaN  0.134571  0.415209
    two       NaN       NaN -1.288889 -0.144284
baz one       NaN       NaN -0.117345 -0.095292
    two       NaN       NaN -0.256708 -0.682542
foo one  2.082229  1.575985       NaN       NaN
    two -0.805592  0.444195       NaN       NaN
qux one  0.368874  0.253556       NaN       NaN
    two  1.017632 -0.471978       NaN       NaN

df_merge - 所需

              A         B       
bar one        0.134571  0.415209
    two       -1.288889 -0.144284
baz one       -0.117345 -0.095292
    two       -0.256708 -0.682542
foo one        2.082229  1.575985       
    two       -0.805592  0.444195       
qux one        0.368874  0.253556      
    two        1.017632 -0.471978       

1 个答案:

答案 0 :(得分:3)

最简单的是使用concat,默认情况下,'outer'连接并连接特定轴上的pandas对象(此处为axis=0,默认值):

print (pd.concat([df_1,df_2]))

                A         B
foo one -0.329887 -0.966898
    two  0.552272 -1.964264
qux one -0.629764 -0.765578
    two -0.148118  0.904920
bar one  0.305685 -1.269400
    two  1.256213 -0.686447
baz one -2.194461  0.529666
    two -1.487217 -0.520045

如有必要,请sort_index

print (pd.concat([df_1,df_2]).sort_index())

                A         B
bar one -0.463547 -0.002780
    two -0.421346 -1.730840
baz one -0.086068  1.179000
    two  0.756876 -0.492985
foo one -0.223900 -0.302643
    two  0.460265  0.216632
qux one -0.296815  0.799978
    two -0.420700  1.147312