我有2个数据框df1
和df2
,我想根据它们的列'C'
来加入
import pandas
df1 = pandas.DataFrame(data=[[1,0,2,4],[2,3,1,3]],columns=['A','B','C','D'])
df2 = pandas.DataFrame(data=[[2,2,2,4],[3,4,1,3]],columns=['A','F','C','D'])
df1
A B C D
0 1 0 2 4
1 2 3 1 3
df2
A F C D
0 2 2 2 4
1 3 4 1 3
# Merge the dataframes
dataframe_matched = df1.join(
other=df2.set_index('C'),
on='C',
how="inner",
lsuffix="_left",
rsuffix="_right",
sort=True,
)
dataframe_matched
A_left B C D_left A_right F D_right
1 2 3 1 3 3 4 3
0 1 0 2 4 2 2 4
列D_left
和D_right
相同。
有没有一种简单的方法可以将原始名称保留为1?
dataframe_matched
A_left B C D A_right F
1 2 3 1 3 3 4
0 1 0 2 4 2 2
答案 0 :(得分:1)
您可以进行drop_duplicates
df1.merge(df2,on='C').T.drop_duplicates().T
Out[288]:
A_x B C D_x A_y F
0 1 0 2 4 2 2
1 2 3 1 3 3 4
更新
pd.concat([df1.set_index('C'),df2.set_index('C')],1,keys=['right','left']).\
T.reset_index(level=1).\
drop_duplicates().set_index('level_1',append=True).T
Out[337]:
right left
level_1 A B D A F
C
2 1 0 4 2 2
1 2 3 3 3 4