这是关于我在这里提出的问题:compare two pandas dataframes with unequal columns
也参考:How to implement 'in' and 'not in' for Pandas dataframe
我创建了两个pandas数据框:
DataFrame:单词
0
0 limited
1 desirable
2 advices
DataFrame:mcDonaldWL
Word Negative Positive Uncertainty
9 abandon 2009 0 0
10 abandoned 2009 0 0
11 desirables 0 2009 0
12 abandonment 2009 0 0
13 advices 2009 0 0
14 abandons 2009 0 0
我的目标是将单词[0]与mcDonaldWL ['Word']进行比较,如果出现第i个元素,则显示结果。
Result
Word Negative Positive Uncertainty
11 desirables 0 2009 0
13 advices 2009 0 0
我尝试使用set,intersection,merge,但找不到解决方案。有什么想法吗?
它不会产生所需的答案。这不重复。
如果我跑
words[~words.word.isin(mcDonaldWL)]
我明白了:
word
0 limited
1 desirable
答案 0 :(得分:1)
假设你有:
>>> df1
col1
0 limited
1 desirables
2 advices
>>> df2
Word Negative Positive Uncertainty
9 abandon 2009 0 0
10 abandoned 2009 0 0
11 desirables 0 2009 0
12 abandonment 2009 0 0
13 advices 2009 0 0
14 abandons 2009 0 0
注意,我已经为您的第一个数据框提供了正确的列标签。无论如何,最简单的方法是使用Word
作为索引:
>>> df2.set_index('Word', inplace=True)
>>> df2
Negative Positive Uncertainty
Word
abandon 2009 0 0
abandoned 2009 0 0
desirables 0 2009 0
abandonment 2009 0 0
advices 2009 0 0
abandons 2009 0 0
然后你可以使用索引!
>>> df2.loc[df1.col1.values]
Negative Positive Uncertainty
Word
limited NaN NaN NaN
desirables 0.0 2009.0 0.0
advices 2009.0 0.0 0.0
>>> df2.loc[df1.col1.values].dropna()
Negative Positive Uncertainty
Word
desirables 0.0 2009.0 0.0
advices 2009.0 0.0 0.0
>>>
答案 1 :(得分:1)
使用模糊匹配
from fuzzywuzzy import process
l=words.iloc[:,0].values.tolist()
a=[]
for x in mcDonaldWL.Word:
if [process.extract(x, l, limit=1)][0][0][1]>=80:
a.append([process.extract(x, l, limit=1)][0][0][0])
else:
a.append(np.nan)
mcDonaldWL['canfind']=a
mcDonaldWL.dropna().drop('canfind',1)
Out[494]:
Word Negative Positive Uncertainty
11 desirables 0 2009 0
13 advices 2009 0 0
答案 2 :(得分:1)
方法1
ws = words.values.ravel().astype(str)
wl = mcDonaldWL.Word.values.astype(str)
mcDonaldWL[(np.core.defchararray.find(wl[:, None], ws) >= 0).any(1)]
Word Negative Positive Uncertainty
11 desirables 0 2009 0
13 advices 2009 0 0
方法2
mcDonaldWL[mcDonaldWL.Word.str.contains('|'.join(words.values.ravel()))]
Word Negative Positive Uncertainty
11 desirables 0 2009 0
13 advices 2009 0 0
答案 3 :(得分:0)
在words
中,您有“理想”,但在mcDonaldWL
中,您有“desirables”。假设这些应该是相同的,你可以这样做:
mcDonaldWL.set_index('Word', inplace=True)
mcDonaldWL.loc[words[0]]
此外,“建议”不是一个词。