我有两个数据帧df1和df2
Sprite
和
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 4-7 277 NaN
2 354 1-3 233 2.0
3 4-7 85 NaN
4 947 1-3 156 4.0
我使用代码合并:
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 4-7 33 NaN
In:df=df1.merge(df2, left_on='deviceid',right_on='deviceid', how='left')
df
从上面我们可以看到,对于某些值,如果不存在,则会再次重复这些值
但预期的输出是
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266 4.0 1-3 186 4.0
1 4-7 277 NaN 4-7 25 NaN
2 4-7 277 NaN 4-7 33 NaN
3 354 1-3 233 2.0 1-3 44 1.0
4 4-7 85 NaN 4-7 25 NaN
5 4-7 85 NaN 4-7 33 NaN
6 947 1-3 156 4.0 1-3 213 3.0
答案 0 :(得分:1)
首先删除替换duplicated
中的ID
DataFrames
:
#df1['ID'] = df1['ID'].mask(df['ID'].duplicated(), '')
#df2['ID'] = df2['ID'].mask(df['ID'].duplicated(), '')
print (df1)
ID Range(US) Count(US) Mean(US)
0 690 1-3 266 4.0
1 690 4-7 277 NaN
2 354 1-3 233 2.0
3 354 4-7 85 NaN
4 947 1-3 156 4.0
print (df2)
ID Range(UK) Count(UK) Mean(UK)
0 690 1-3 186 4.0
1 690 4-7 25 NaN
2 354 1-3 44 1.0
3 947 1-3 213 3.0
4 947 4-7 33 NaN
然后用两个列合并外连接:
df = df1.merge(df2, left_on=['ID', 'Range(US)'], right_on=['ID', 'Range(UK)'], how='outer')
print (df)
ID Range(US) Count(US) Mean(US) Range(UK) Count(UK) Mean(UK)
0 690 1-3 266.0 4.0 1-3 186.0 4.0
1 690 4-7 277.0 NaN 4-7 25.0 NaN
2 354 1-3 233.0 2.0 1-3 44.0 1.0
3 354 4-7 85.0 NaN NaN NaN NaN
4 947 1-3 156.0 4.0 1-3 213.0 3.0
5 947 NaN NaN NaN 4-7 33.0 NaN