所以我有这两个DF。 这是DF1 [' nice_in_here']:
nice_in_here
0 NaN
1 Krystyna
2 Piotr
3 Domicela
4 Jaro
这是DF2 [[' nice_in_there',' current_club']]:
nice_in_there current_club
0 Krystyna Klub-Duzych-Pup
1 Elżbieta NaN
2 Domicela NaN
3 Piotr Klub-Duzych-Pup
所以我想要的是:
检查DF2 [" nice_in_there"]是否在DF1中[" nice_in_here"]如果是这样,我想加入相应的DF2 [" current_club"]到DF1 [" nice-in_here"]。
我想要的结果是(输入DF1后[[' nice_in_here',' current_club']]):
nice_in_here current_club
0 NaN NaN
1 Krystyna Klub-Duzych-Pup
2 Piotr Klub-Duzych-Pup
3 Domicela NaN
4 Jaro NaN
请注意,我不想放弃NaN,因为缺失值对我很重要。
请帮助,这让我很生气!
答案 0 :(得分:0)
这应该有效:
pd.merge(DF1, DF2, how="left", left_on="nice_in_here", right_on="nice_in_there")
答案 1 :(得分:0)
选项1
您可以使用df.map
:
In [1073]: mapping = dict(df2.values)
In [1074]: df1['current_club'] = df1.nice_in_here.map(mapping); df1
Out[1074]:
nice_in_here current_club
0 NaN NaN
1 Krystyna Klub-Duzych-Pup
2 Piotr Klub-Duzych-Pup
3 Domicela NaN
4 Jaro NaN
选项2
df.merge
可以在这里使用:
In [1079]: df1.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns]
Out[1079]:
nice_in_there current_club
0 NaN NaN
1 Krystyna Klub-Duzych-Pup
2 Piotr Klub-Duzych-Pup
3 Domicela NaN
4 NaN NaN
<强>性能强>
设置涉及与df1
结构相似的数据集,但更长时间:
df11 = pd.concat([df1] * 10000)
以下是时间:
%timeit df11.nice_in_here.map(mapping) # map
100 loops, best of 3: 4.49 ms per loop
%timeit df11.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns] # merge
100 loops, best of 3: 9.61 ms per loop