使用nltk Sentiwordnet和python

时间:2015-11-27 14:25:49

标签: python python-2.7 twitter nltk senti-wordnet

我正在使用python NLTK对twitter数据进行情绪分析。我需要一个包含+ ve和-ve极性单词的字典。我已经阅读了很多关于sentiwordnet的内容,但是当我将它用于我的项目时,它并没有提供有效和快速的结果。我想我没有正确使用它。谁能告诉我使用它的正确方法?以下是我到目前为止所做的步骤:

  1. 推文的标记化
  2. 令牌的POS标记
  3. 将每个标签传递给sentinet
  4. 我正在使用nltk包进行标记化和标记。请参阅下面的代码的一部分:

    import nltk
    from nltk.stem import *
    from nltk.corpus import sentiwordnet as swn
    
    tokens=nltk.word_tokenize(row) #for tokenization, row is line of a file in which tweets are saved.
    tagged=nltk.pos_tag(tokens) #for POSTagging
    
    for i in range(0,len(tagged)):
         if 'NN' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'n'))>0:
                pscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).pos_score() #positive score of a word
                nscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).neg_score()  #negative score of a word
        elif 'VB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'v'))>0:
               pscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).pos_score()
               nscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).neg_score()
        elif 'JJ' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'a'))>0:
               pscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).pos_score()
               nscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).neg_score()
        elif 'RB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'r'))>0:
               pscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).pos_score()
               nscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).neg_score()
    

    最后,我将计算有多少推文是正面的,有多少推文是否定的。 我哪里错了?我该怎么用?还有其他类似的字典易于使用吗?

2 个答案:

答案 0 :(得分:4)

是的,您可以使用其他词典。你可以在这里找到一小部分词典:http://sentiment.christopherpotts.net/lexicons.html#resources 看来刘冰的意见词典很容易使用。

除了链接到那些词典之外,网站是关于情绪分析的非常好的教程。

答案 1 :(得分:0)

计算情绪

alist = [all_tokens_in_doc]

totalScore = 0

count_words_included = 0

for word in all_words_in_comment:

    synset_forms = list(swn.senti_synsets(word[0], word[1]))

    if not synset_forms:

        continue

    synset = synset_forms[0] 

    totalScore = totalScore + synset.pos_score() - synset.neg_score()

    count_words_included = count_words_included +1

final_dec = ''

if count_words_included == 0:

    final_dec = 'N/A'

elif totalScore == 0:

    final_dec = 'Neu'        

elif totalScore/count_words_included < 0:

    final_dec = 'Neg'

elif totalScore/count_words_included > 0:

    final_dec = 'Pos'

return final_dec