将每个句子与r中的正面和负面单词列表相匹配

时间:2017-12-11 16:27:03

标签: r match sentiment-analysis

我有一个由741条评论组成的专栏。我想得到另外两列,其值为每个评论的正面和负面字数。

你们有人能帮助我吗?

我尝试使用此功能,但它通过减去每个评论的负面和正面词语给我一个分数。我想得到2列只包含正面和负面词的数量。

score.sentiment = function(sentence, pos.words, neg.words, .progress=’none’)

    {
     require(plyr)
     require(stringr)


    scores = laply(sentences, function(sentence, pos.words, neg.words) {


     sentence = gsub(‘[[:punct:]]’, ‘’, sentence)
     sentence = gsub(‘[[:cntrl:]]’, ‘’, sentence)
     sentence = gsub(‘\\d+’, ‘’, sentence)

     sentence = tolower(sentence)


     word.list = str_split(sentence, ‘\\s+’)
      words = unlist(word.list)


     pos.matches = match(words, pos.words)
     neg.matches = match(words, neg.words)


     pos.matches = !is.na(pos.matches)
     neg.matches = !is.na(neg.matches)


     score = sum(pos.matches) — sum(neg.matches)

     return(score)
     }, pos.words, neg.words, .progress=.progress )

     scores.df = data.frame(score=scores, text=sentence)
     return(scores.df)
    }

0 个答案:

没有答案