我有两个数据帧df1和df2。我将数据帧1中的列值与df2中的许多列值进行比较,并在另一个数据帧中返回唯一的交集而不重复。
DF1
WORD
0 This
1 is
2 a
3 sample
4 sentence
5 to
6 check
7 NLP
8 in
9 python
df2
Noun Verb
Car stand
Sample sit
walk
run
is
预期输出
DF3
Noun Verb
sample is
我使用以下代码获得以下结果。虽然它是实现结果的圆形方式,但它并不完全正确。它返回了匹配的每个名词列的动词值,我显然不想要。
import pandas as pd
df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df3 = df1.merge(df2, left_on=['Word'], right_on=['Noun'])
print df3.drop('Verb', 1)
Word Noun
0 this this
1 is is
答案 0 :(得分:3)
使用numpy.intersect1d
pd.DataFrame([np.intersect1d(x,df1.WORD.values) for x in df2.values.T],index=df2.columns).T
Out[147]:
Noun Verb
0 Sample is
如果你想使用pandas
df2.mul(df2.apply(lambda x : x.isin(df1.WORD))).apply(lambda x : sorted(x)).iloc[[-1],:]
Out[159]:
Noun Verb
4 Sample is