我对情感(愤怒,恐惧,期待,信任等)有一个与情感相关的词语
期望列表:
{'anticipation': ['abundance',
'opera',
'star',
'start',
'achievement',
'acquiring',...]
而且,我有一个带有成行句子的数据框。我想找到与情感相关的单词
| text |
|--------------------------- |
| operation start yesterday |
| operation start now |
| operation halt |
预期产量
| text | result |
|--------------------------- |------------- |
| operation start yesterday | start |
| operation start now | start |
| operation achievement | achievement |
我尝试过
df['result']=df["text"].str.findall(r"\b"+"|".join(anticipationlist) +r"\b").apply(", ".join)
我的结果是
| text | result |
|--------------------------- |-------------------- |
| operation start yesterday | opera, star |
| operation start now | opera, star |
| operation achievement | opera, achievement |
如何改进代码以获得所需的结果?
答案 0 :(得分:1)
您可以为每个值分别添加单词边界:
pat = '|'.join(r"\b{}\b".format(x) for x in anticipationlist)
df['result']=df["text"].str.findall(pat).apply(", ".join)
print (df)
text result
0 operation start yesterday start
1 operation start now start
2 operation achievement achievement
答案 1 :(得分:0)
这是一种不使用正则表达式的方法。另外,我将您的anticipationlist
从dict
更改为list
。
import pandas as pd
anticipationlist= ['abundance',
'opera',
'star',
'start',
'achievement',
'acquiring',
]
values = [
'operation start yesterday',
'operation start now',
'operation achievement',
]
df = pd.DataFrame(data=values, columns=['text'])
def find_values(x):
results = []
for value in anticipationlist:
for word in x.split():
if word == value:
results.append(word)
return ' '.join(results)
df['result'] = df['text'].apply(lambda x: find_values(x))
print(df.head())