I want to combine 2 dataframes with some conditions. I think I need aggregated features from both pd.merge and pd.concat, I read through all the examples here but still didn't find info to solve my problem.
left:
key1 key2 valueX valueY
A a1 1 4
B b1 2 5
C c1 3 6
right:
key1 key2 valueX valueY
A a1 7 10
B b2 8 11
C c1 9 12
I want to combine them so it's
like below:
key1 key2 valueX valueY valueX valueY
A a1 1 4 7 10
B b1 2 5 nan nan
B b2 nan nan 8 11
C c1 3 6 9 12
答案 0 :(得分:3)
Not sure why you want duplicate columns
, but you can using concat
Newdf=pd.concat([df1.set_index(['key1', 'key2']),df2.set_index(['key1', 'key2'] )],axis=1).\
reset_index()
Newdf
Out[711]:
key1 key2 valueX valueY valueX valueY
0 A a1 1.0 4.0 7.0 10.0
1 B b1 2.0 5.0 NaN NaN
2 B b2 NaN NaN 8.0 11.0
3 C c1 3.0 6.0 9.0 12.0
答案 1 :(得分:3)
Perform a FULL OUTER JOIN with merge
, and remove the suffixes afterward.
u = left.merge(right, on=['key1', 'key2'], suffixes=('', '__2'), how='outer')
u.columns = u.columns.str.replace('__2', '')
u
key1 key2 valueX valueY valueX valueY
0 A a1 1.0 4.0 7.0 10.0
1 B b1 2.0 5.0 NaN NaN
2 C c1 3.0 6.0 9.0 12.0
3 B b2 NaN NaN 8.0 11.0
答案 2 :(得分:2)
You can merge with space as suffix and strip later
new_df = df1.merge(df2, on = ['key1', 'key2'], suffixes=(' ', ' '), how = 'outer')
new_df.columns = new_df.columns.str.strip()
key1 key2 valueX valueY valueX valueY
0 A a1 1.0 4.0 7.0 10.0
1 B b1 2.0 5.0 NaN NaN
2 C c1 3.0 6.0 9.0 12.0
3 B b2 NaN NaN 8.0 11.0