我有第一个dataFrame
df1:
A B C D
Car 0
Bike 0
Train 0
Plane 0
Other_1 Plane 2
Other_2 Plane 3
Other 3 Plane 4
和另外一个:
df2:
A B
Car 4 %
Bike 5 %
Train 6 %
Plane 7 %
所以我想得到这种组合:
df1:
A B C D
Car 0 4 %
Bike 0 5 %
Train 0 6 %
Plane 0 7 %
Other_1 Plane 2 2
Other_2 Plane 3 3
Other 3 Plane 4 4
哪个是最好的方法?
答案 0 :(得分:3)
如果df和df2的索引相同,则可以使用:
df['D'] = df2['B'].combine_first(df['C'])
输出:
A B C D
0 Car NaN 0 4 %
1 Bike NaN 0 5 %
2 Train NaN 0 6 %
3 Plane NaN 0 7 %
4 Other_1 Plane 2 2
5 Other_2 Plane 3 3
6 Other_3 Plane 4 4
如果索引不一致,则可以在A列上使用merge
:
df_out = df.merge(df2, on ='A', how='left', suffixes=('','y'))
df_out.assign(D = df_out.By.fillna(df_out.C)).drop('By', axis=1)
或使用@piRSquared improved one-liner:
df.drop('D',1).merge(df2.rename(columns={'B':'D'}), how='left',on ='A')
输出:
A B C D
0 Car NaN 0 4 %
1 Bike NaN 0 5 %
2 Train NaN 0 6 %
3 Plane NaN 0 7 %
4 Other_1 Plane 2 2
5 Other_2 Plane 3 3
6 Other_3 Plane 4 4
答案 1 :(得分:1)
map
df1.assign(D=df1.A.map(dict(zip(df2.A, df2.B))))
A B C D
0 Car NaN 0 4 %
1 Bike NaN 0 5 %
2 Train NaN 0 6 %
3 Plane NaN 0 7 %
4 Other_1 Plane 2 NaN
5 Other_2 Plane 3 NaN
6 Other_3 Plane 4 NaN