使用R进行情感分析(代码无法正常工作)

时间:2016-06-21 06:38:34

标签: r sentiment-analysis lexicon

我试图使用基于词典的评分方法对文本进行一些情绪分析。 在阅读了堆栈溢出帖子后,我直接从http://analyzecore.com/2014/04/28/twitter-sentiment-analysis/借了我的代码:R sentiment analysis with phrases in dictionaries

这里有一些关于我的数据集的摘要:

> summary(data$text)
   Length     Class      Mode 
       30 character character 
> str(data$text)
 chr [1:30] "Hey everybody, are you guys free on Sunday for a game play + dinner afterwards? I'll reserve a"| __truncated__ ...

和代码我使用:

require(plyr)  
require(stringr)
require(data.table)
score.sentiment = function(sentences, pos.words, neg.words, .progress='none')
{
  scores = laply(sentences, function(sentence, pos.words, neg.words) {

    sentence = gsub('[[:punct:]]', '', sentence)
    sentence = gsub('[[:cntrl:]]', '', sentence)
    sentence = gsub('\\d+', '', sentence)
    # and convert to lower case:
    sentence = tolower(sentence)

    # split into words. str_split is in the stringr package
    word.list = str_split(sentence, '\\s+')
    # sometimes a list() is one level of hierarchy too much
    words = unlist(word.list)

    # compare our words to the dictionaries of positive & negative terms
    pos.matches = match(words, pos.words)
    neg.matches = match(words, neg.words)

    pos.matches = !is.na(pos.matches)
    neg.matches = !is.na(neg.matches)

    # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
    score = (sum(pos.matches) - sum(neg.matches))

    return(score)
  } , pos.words, neg.words, .progress=.progress)

  scores.df = data.frame(score = scores, text = sentences)
  return(scores.df)
}

我正在使用Bing Liu的意见词典,我把它们加载为:

pos_BL = read.table(file = 'positive-words.txt', stringsAsFactors = F)
neg_BL = read.table(file = 'negative-words.txt', stringsAsFactors = F)

这里是我用来通过评分函数运行数据和字典的代码:

score_result = score.sentiment(sentences = data$text, 
                               pos.words = pos_BL, 
                               neg.words = neg_BL, 
                               .progress= 'text')

然而,无论我做什么,我的所有30个琴弦的得分都只有0分。 (见下表输出摘要):

> table(score_result$score)
 0 
30 

我不知道在哪里修复(我在发布此问题之前在我自己的代码中发现了很多错误)。非常感谢任何帮助!

2 个答案:

答案 0 :(得分:0)

一个例子:

library(dplyr) 
mutate(df, views_per_user01=views01/users01,                
           views_per_user02=views02/users02)

答案 1 :(得分:0)

您必须注意不要引入表或df而不是作为函数“ score.sentiment”的“ pos.words”和“ neg.words”参数的向量。在这种情况下,将需要更长的时间,并且不会返回任何结果。尝试这样的事情:

score_result = score.sentiment(sentences = data$text, 
                               pos.words = as.character(pos_BL[ , 1]), 
                               neg.words = as.character(neg_BL[ , 1]), 
                               .progress= 'text')

也许不需要'as.character()'函数。