Question

所以我有这两个DF。这是DF1 [＆＃39; nice_in_here＆＃39;]：

                         nice_in_here
0                                 NaN
1                            Krystyna
2                               Piotr  
3                            Domicela
4                                Jaro

这是DF2 [[＆＃39; nice_in_there＆＃39;，＆＃39; current_club＆＃39;]]：

    nice_in_there               current_club
0   Krystyna                    Klub-Duzych-Pup
1   Elżbieta                    NaN
2   Domicela                    NaN
3   Piotr                       Klub-Duzych-Pup

所以我想要的是：

检查DF2 [＆＃34; nice_in_there＆＃34;]是否在DF1中[＆＃34; nice_in_here＆＃34;]如果是这样，我想加入相应的DF2 [＆＃34; current_club＆＃34;]到DF1 [＆＃34; nice-in_here＆＃34;]。

我想要的结果是（输入DF1后[[＆＃39; nice_in_here＆＃39;，＆＃39; current_club＆＃39;]]）：

                         nice_in_here        current_club
0                                 NaN                 NaN
1                            Krystyna     Klub-Duzych-Pup
2                               Piotr     Klub-Duzych-Pup
3                            Domicela                 NaN
4                                Jaro                 NaN

请注意，我不想放弃NaN，因为缺失值对我很重要。

请帮助，这让我很生气！

Answer 1

这应该有效：

pd.merge(DF1, DF2, how="left", left_on="nice_in_here", right_on="nice_in_there")

Answer 2

选项1

您可以使用df.map：

In [1073]: mapping = dict(df2.values)

In [1074]: df1['current_club'] = df1.nice_in_here.map(mapping); df1
Out[1074]: 
  nice_in_here     current_club
0          NaN              NaN
1     Krystyna  Klub-Duzych-Pup
2        Piotr  Klub-Duzych-Pup
3     Domicela              NaN
4         Jaro              NaN

选项2

df.merge可以在这里使用：

In [1079]: df1.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns]
Out[1079]: 
  nice_in_there     current_club
0           NaN              NaN
1      Krystyna  Klub-Duzych-Pup
2         Piotr  Klub-Duzych-Pup
3      Domicela              NaN
4           NaN              NaN

<强>性能

设置涉及与df1结构相似的数据集，但更长时间：

df11 = pd.concat([df1] * 10000)

以下是时间：

%timeit df11.nice_in_here.map(mapping) # map
100 loops, best of 3: 4.49 ms per loop

%timeit df11.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns] # merge
100 loops, best of 3: 9.61 ms per loop

如果列之间匹配，请加入两个数据帧

2 个答案: