Question

I want to combine 2 dataframes with some conditions. I think I need aggregated features from both pd.merge and pd.concat, I read through all the examples here but still didn't find info to solve my problem.

left:

key1  key2  valueX  valueY
 A    a1     1       4
 B    b1     2       5
 C    c1     3       6

right:

key1  key2  valueX  valueY
 A    a1     7       10
 B    b2     8       11
 C    c1     9       12

I want to combine them so it's

merged on 2 keys on axis=1
outer join
keep the ValueX, valueY names unchanged, just append new columns to the right with the same column names

like below:

    key1  key2  valueX  valueY  valueX  valueY
     A    a1     1       4       7       10
     B    b1     2       5      nan      nan
     B    b2    nan     nan      8       11
     C    c1     3       6       9       12

Answer 1

Not sure why you want duplicate columns , but you can using concat

Newdf=pd.concat([df1.set_index(['key1',  'key2']),df2.set_index(['key1',  'key2'] )],axis=1).\
        reset_index()
Newdf
Out[711]: 
  key1 key2  valueX  valueY  valueX  valueY
0    A   a1     1.0     4.0     7.0    10.0
1    B   b1     2.0     5.0     NaN     NaN
2    B   b2     NaN     NaN     8.0    11.0
3    C   c1     3.0     6.0     9.0    12.0

Answer 2

Perform a FULL OUTER JOIN with merge, and remove the suffixes afterward.

u = left.merge(right, on=['key1', 'key2'], suffixes=('', '__2'), how='outer') 
u.columns = u.columns.str.replace('__2', '')

u
  key1 key2  valueX  valueY  valueX  valueY
0    A   a1     1.0     4.0     7.0    10.0
1    B   b1     2.0     5.0     NaN     NaN
2    C   c1     3.0     6.0     9.0    12.0
3    B   b2     NaN     NaN     8.0    11.0

Answer 3

You can merge with space as suffix and strip later

new_df = df1.merge(df2, on = ['key1', 'key2'], suffixes=(' ', ' '), how = 'outer')
new_df.columns = new_df.columns.str.strip()

    key1    key2    valueX  valueY  valueX  valueY
0   A       a1      1.0     4.0     7.0     10.0
1   B       b1      2.0     5.0     NaN     NaN
2   C       c1      3.0     6.0     9.0     12.0
3   B       b2      NaN     NaN     8.0     11.0

Merging pandas DataFrames without changing the original column names

3 个答案: