我在pandas python中有两个数据框:
DF1:
Fruit Origin
0 Apple Spain
1 Apple France
2 Apple Italy
3 Banana Germany
4 Banana Portugal
5 Grapes France
6 Grapes Spain
DF2:
Fruit
0 Apple
1 Banana
2 Grapes
我希望通过df2中每个水果的索引修改df1中的水果列,我要找的结果应该是这样的:
DF1:
Fruit Origin
0 0 Spain
1 0 France
2 0 Italy
3 1 Germany
4 1 Portugal
5 2 France
6 2 Spain
我尝试过的是:
df1['Fruit'] = df1.Fruit.apply(lambda x: df2.index[df2.Fruit == x])
但是我正在处理一个大数据集,因此需要花费太多时间,我正在寻找一个更快的选项来实现这一目标。
答案 0 :(得分:1)
我建议使用join
。首先,我们要将df2
的索引设置为Fruits
列:
df2 = df2.reset_index().set_index('Fruit')
那样
index
Fruit
Apple 0
Banana 1
Grapes 2
现在我们写一下:
>>> df1.join(df2, on='Fruit')
Fruit Origin index
0 Apple Spain 0
1 Apple France 0
2 Apple Italy 0
3 Banana Germany 1
4 Banana Portugal 1
5 Grapes France 2
6 Grapes Spain 2