使用nltk.sentiment.vader库中的SentimentIntensityAnalyzer从数据框中提取正,负和中性关键字

时间:2019-09-10 04:47:01

标签: python-3.6 vader

我正在使用python中的SentimentIntensityAnalyzer提取正,负和中性关键字.remarks.txt文件(编码为UTF-8)中有10,000条评论。我想导入文本文件,请阅读单独一行评论,并使用SentimentIntensityAnalyzer提取肯定,否定和中性的关键字。我想分析c2列中提到的注释并在新的相邻列中提供提取的关键字。我编写了一个小程序,从nltk调用SentimentIntensityAnalyzer函数。我在Python中创建了sentiment.vader库。我创建了get_keywords()函数,但是面临将数据帧的每一行作为参数传递并使用for循环调用以提供关键字并将其存储在相邻列中的问题。

    import nltk
    from nltk.tokenize import word_tokenize, RegexpTokenizer
    from nltk.sentiment.vader import SentimentIntensityAnalyzer
    sid = SentimentIntensityAnalyzer()
    remarks = pd.read_csv('/Users/XYZ/Desktop/comments/Comments.txt', 
    sep='\t')
    df = pd.DataFrame(remarks)
    remarks.head(5)

    def get_keywords(row, **kwargs):
    pos_word_list=[]
    neu_word_list=[]
    neg_word_list=[]

    sentiment_score = sid.polarity_scores(str(row))
    positive_meter = round((sentiment_score['pos']*10), 2)
    neutral_meter = round((sentiment_score['neu']*10), 2)
    negative_meter = round((sentiment_score['neg']*10), 2)
    tokenized_sentence = nltk.word_tokenize(str(row))

    df['positive_words'] = df.c2.apply(get_keywords, k='positive')
    df['neutral_words'] = df.c2.apply(get_keywords, k='neutral')
    df['negative_words'] = df.c2.apply(get_get_keywords, k='negative')

    for index, row in df.iterrows():
     if (sid.polarity_scores(str(tokenized_sentence))['compound']) >= 0.5:
        pos_word_list.append(str(tokenized_sentence))
     elif (sid.polarity_scores(str(tokenized_sentence))['compound']) <= -0.5:
        neg_word_list.append(str(tokenized_sentence))
     else:
        neu_word_list.append(str(tokenized_sentence))                
     print(row['c1'], row['c2'],"Positive : {}, Neutral: {}, Negative : 
     {},Positive words: {}, Neutral words: {}, Negative words:    {}".format(row['positive'],row['neutral'],row['negative'],row['pos_word_list'],row['neu_word_list'],row['neg_word_list']))

     df.to_csv('Comments_modified.csv')
     df.head(15)

预期输出:-一个包含所有列c1(序列号),c2(注释),Positive(正情绪得分),Neutral(中性情绪得分),Negative(中性所有列)的Comment_modified文件情感得分)和所有10,000条评论的相应行中的值。

错误:我收到以下错误,而且,我得到“无”  每行中的肯定字,中性字和否定字

我如何实现预期的结果?

回溯(最近通话最近):  在第26行的文件“”中     如果(sid.polarity_scores(tokenized_sentence)['compound'])> = 0.5:

文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ nltk \ sentiment \ vader.py”,行353,在polars_scores中     sentitext = SentiText(文本)

文件“ C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ nltk \ sentiment \ vader.py”,行284,位于初始中     文字= str(text.encode('utf-8'))

AttributeError:“列表”对象没有属性“编码”

0 个答案:

没有答案