Question

我有两个数据帧df1和df2。我将数据帧1中的列值与df2中的许多列值进行比较，并在另一个数据帧中返回唯一的交集而不重复。

DF1

       WORD    
0     This     
1       is    
2        a    
3    sample   
4  sentence   
5        to  
6     check  
7      NLP   
8        in   
9    python

df2

Noun    Verb
Car     stand 
Sample  sit
        walk
        run
        is

预期输出

DF3

    Noun      Verb
    sample    is

我使用以下代码获得以下结果。虽然它是实现结果的圆形方式，但它并不完全正确。它返回了匹配的每个名词列的动词值，我显然不想要。

import pandas as pd
df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df3 = df1.merge(df2, left_on=['Word'], right_on=['Noun'])
print df3.drop('Verb', 1)

   Word  Noun
0  this  this
1    is    is

Answer 1

使用numpy.intersect1d

pd.DataFrame([np.intersect1d(x,df1.WORD.values) for x in df2.values.T],index=df2.columns).T
Out[147]: 
     Noun Verb
0  Sample   is

如果你想使用pandas

df2.mul(df2.apply(lambda x : x.isin(df1.WORD))).apply(lambda x : sorted(x)).iloc[[-1],:]
Out[159]: 
     Noun Verb
4  Sample   is

列比较和返回值

1 个答案: