如果列之间匹配,请加入两个数据帧

时间:2017-08-22 10:01:47

标签: python pandas dataframe

所以我有这两个DF。 这是DF1 [' nice_in_here']:

                         nice_in_here
0                                 NaN
1                            Krystyna
2                               Piotr  
3                            Domicela
4                                Jaro

这是DF2 [[' nice_in_there',' current_club']]:

    nice_in_there               current_club
0   Krystyna                    Klub-Duzych-Pup
1   Elżbieta                    NaN
2   Domicela                    NaN
3   Piotr                       Klub-Duzych-Pup

所以我想要的是:

检查DF2 [" nice_in_there"]是否在DF1中[" nice_in_here"]如果是这样,我想加入相应的DF2 [" current_club"]到DF1 [" nice-in_here"]。

我想要的结果是(输入DF1后[[' nice_in_here',' current_club']]):

                         nice_in_here        current_club
0                                 NaN                 NaN
1                            Krystyna     Klub-Duzych-Pup
2                               Piotr     Klub-Duzych-Pup
3                            Domicela                 NaN
4                                Jaro                 NaN

请注意,我不想放弃NaN,因为缺失值对我很重要。

请帮助,这让我很生气!

2 个答案:

答案 0 :(得分:0)

这应该有效:

pd.merge(DF1, DF2, how="left", left_on="nice_in_here", right_on="nice_in_there") 

答案 1 :(得分:0)

选项1

您可以使用df.map

In [1073]: mapping = dict(df2.values)

In [1074]: df1['current_club'] = df1.nice_in_here.map(mapping); df1
Out[1074]: 
  nice_in_here     current_club
0          NaN              NaN
1     Krystyna  Klub-Duzych-Pup
2        Piotr  Klub-Duzych-Pup
3     Domicela              NaN
4         Jaro              NaN

选项2

df.merge可以在这里使用:

In [1079]: df1.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns]
Out[1079]: 
  nice_in_there     current_club
0           NaN              NaN
1      Krystyna  Klub-Duzych-Pup
2         Piotr  Klub-Duzych-Pup
3      Domicela              NaN
4           NaN              NaN

<强>性能

设置涉及与df1结构相似的数据集,但更长时间:

df11 = pd.concat([df1] * 10000)

以下是时间:

%timeit df11.nice_in_here.map(mapping) # map
100 loops, best of 3: 4.49 ms per loop

%timeit df11.merge(df2, how='left', left_on='nice_in_here', right_on='nice_in_there')[df2.columns] # merge
100 loops, best of 3: 9.61 ms per loop