在Python中合并两个不规则数据帧

时间:2018-06-05 09:14:20

标签: python python-3.x pandas dataframe merge

我有两个数据帧df1和df2

Sprite

    ID      Range(US)            Count(US)          Mean(US)
0   690      1-3                 266                4.0
1            4-7                 277                NaN
2   354      1-3                 233                2.0
3            4-7                 85                 NaN
4   947      1-3                 156                4.0

我使用代码合并:
ID Range(UK) Count(UK) Mean(UK) 0 690 1-3 186 4.0 1 4-7 25 NaN 2 354 1-3 44 1.0 3 947 1-3 213 3.0 4 4-7 33 NaN

In:df=df1.merge(df2, left_on='deviceid',right_on='deviceid', how='left')
    df

从上面我们可以看到,对于某些值,如果不存在,则会再次重复这些值

但预期的输出是

 ID  Range(US)   Count(US)    Mean(US)   Range(UK)  Count(UK)    Mean(UK)       
 0  690    1-3      266         4.0        1-3        186         4.0
 1         4-7      277         NaN        4-7        25          NaN
 2         4-7      277         NaN        4-7        33          NaN
 3  354    1-3      233         2.0        1-3        44          1.0
 4         4-7      85          NaN        4-7        25          NaN
 5         4-7      85          NaN        4-7        33          NaN
 6  947    1-3      156         4.0        1-3        213         3.0

1 个答案:

答案 0 :(得分:1)

首先删除替换duplicated中的ID DataFrames

#df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '') 
#df2['ID'] = df2['ID'].mask(df['ID'].duplicated(), '') 

print (df1)
    ID Range(US)  Count(US)  Mean(US)
0  690       1-3        266       4.0
1  690       4-7        277       NaN
2  354       1-3        233       2.0
3  354       4-7         85       NaN
4  947       1-3        156       4.0

print (df2)
    ID Range(UK)  Count(UK)  Mean(UK)
0  690       1-3        186       4.0
1  690       4-7         25       NaN
2  354       1-3         44       1.0
3  947       1-3        213       3.0
4  947       4-7         33       NaN

然后用两个列合并外连接:

df = df1.merge(df2, left_on=['ID', 'Range(US)'], right_on=['ID', 'Range(UK)'], how='outer')
print (df)
    ID Range(US)  Count(US)  Mean(US) Range(UK)  Count(UK)  Mean(UK)
0  690       1-3      266.0       4.0       1-3      186.0       4.0
1  690       4-7      277.0       NaN       4-7       25.0       NaN
2  354       1-3      233.0       2.0       1-3       44.0       1.0
3  354       4-7       85.0       NaN       NaN        NaN       NaN
4  947       1-3      156.0       4.0       1-3      213.0       3.0
5  947       NaN        NaN       NaN       4-7       33.0       NaN