我正在使用python中的SentimentIntensityAnalyzer提取正,负和中性关键字.remarks.txt文件(编码为UTF-8)中有10,000条评论。我想导入文本文件,请阅读单独一行评论,并使用SentimentIntensityAnalyzer提取肯定,否定和中性的关键字。我想分析c2列中提到的注释并在新的相邻列中提供提取的关键字。我编写了一个小程序,从nltk调用SentimentIntensityAnalyzer函数。我在Python中创建了sentiment.vader库。我创建了get_keywords()函数,但是面临将数据帧的每一行作为参数传递并使用for循环调用以提供关键字并将其存储在相邻列中的问题。
import nltk
from nltk.tokenize import word_tokenize, RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
remarks = pd.read_csv('/Users/XYZ/Desktop/comments/Comments.txt',
sep='\t')
df = pd.DataFrame(remarks)
remarks.head(5)
def get_keywords(row, **kwargs):
pos_word_list=[]
neu_word_list=[]
neg_word_list=[]
sentiment_score = sid.polarity_scores(str(row))
positive_meter = round((sentiment_score['pos']*10), 2)
neutral_meter = round((sentiment_score['neu']*10), 2)
negative_meter = round((sentiment_score['neg']*10), 2)
tokenized_sentence = nltk.word_tokenize(str(row))
df['positive_words'] = df.c2.apply(get_keywords, k='positive')
df['neutral_words'] = df.c2.apply(get_keywords, k='neutral')
df['negative_words'] = df.c2.apply(get_get_keywords, k='negative')
for index, row in df.iterrows():
if (sid.polarity_scores(str(tokenized_sentence))['compound']) >= 0.5:
pos_word_list.append(str(tokenized_sentence))
elif (sid.polarity_scores(str(tokenized_sentence))['compound']) <= -0.5:
neg_word_list.append(str(tokenized_sentence))
else:
neu_word_list.append(str(tokenized_sentence))
print(row['c1'], row['c2'],"Positive : {}, Neutral: {}, Negative :
{},Positive words: {}, Neutral words: {}, Negative words: {}".format(row['positive'],row['neutral'],row['negative'],row['pos_word_list'],row['neu_word_list'],row['neg_word_list']))
df.to_csv('Comments_modified.csv')
df.head(15)
预期输出:-一个包含所有列c1(序列号),c2(注释),Positive(正情绪得分),Neutral(中性情绪得分),Negative(中性所有列)的Comment_modified文件情感得分)和所有10,000条评论的相应行中的值。
错误:我收到以下错误,而且,我得到“无” 每行中的肯定字,中性字和否定字
列我如何实现预期的结果?
回溯(最近通话最近): 在第26行的文件“”中 如果(sid.polarity_scores(tokenized_sentence)['compound'])> = 0.5:
文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ nltk \ sentiment \ vader.py”,行353,在polars_scores中 sentitext = SentiText(文本)
文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ nltk \ sentiment \ vader.py”,行284,位于初始中 文字= str(text.encode('utf-8'))
AttributeError:“列表”对象没有属性“编码”