在另一个数据框中找到的句子中查找存储在数据框中的短语

时间:2018-08-29 08:53:37

标签: python pandas dataframe

假设我有2个数据框:

<key>CFBundlePackageType</key>
<string>APPL</string>

一个包含各种主题,另一个文本应该可以从中提取主题

我希望文本数据框的输出为:

sub = pd.DataFrame(['Little Red', 'Grow Your', 'James Bond', 'Tom Brady'])
text = pd.DataFrame(['Little Red Corvette must Grow Your ego', 'Grow Your Beans', 'James Dean and his Little Red coat', 'I love pasta'])

有什么想法可以实现吗? 我当时在看这个问题:Check if words in one dataframe appear in another (python 3, pandas) 但这与我期望的输出不完全相同。谢谢

1 个答案:

答案 0 :(得分:5)

使用str.findall,将sub的所有|值与正则表达式词的边界结合在一起:

pat = '|'.join(r"\b{}\b".format(x) for x in sub[0])
text['new'] = text[0].str.findall(pat).str.join(', ')
print (text)
                                        0                    new
0  Little Red Corvette must Grow Your ego  Little Red, Grow Your
1                         Grow Your Beans              Grow Your
2      James Dean and his Little Red coat             Little Red
3                            I love pasta                       

如果要NaN输入不匹配的值,请使用loc

pat = '|'.join(r"\b{}\b".format(x) for x in sub[0])
lists = text[0].str.findall(pat)
m = lists.astype(bool)
text.loc[m, 'new'] = lists.loc[m].str.join(',')
print (text)
                                        0                   new
0  Little Red Corvette must Grow Your ego  Little Red,Grow Your
1                         Grow Your Beans             Grow Your
2      James Dean and his Little Red coat            Little Red
3                            I love pasta                   NaN