给出评论的情绪极性

时间:2017-02-21 14:35:33

标签: r

我有一个学生的评论:

The course was interesting, but the professor was so boring.

包含所有情绪词及其极性(正极性和负极性)的情绪数据帧

> sentiment_DF
word  positive-polarity  negative_polarity
interesting  1  0
boring  0  1
pretty  1  0
...

我尝试用R做一个函数来确定文本情感词的极性。 所以为此,我提取了文本中的所有单词:

# split into words. str_split is in the stringr package
word.list = str_split(sentence, '\\s+')
# sometimes a list() is one level of hierarchy too much
words = unlist(word.list)

然后,检查列表中的每个单词是否存在于sentiment_dataframe中并确定其极性 我尝试使用此代码:

library(data.table)
dt <- setDT(sentiment_DF)
dt <- melt(sentiment_DF, id.vars = "word")
dt[word == "b" & value > 0, variable]

算法:

overall_sentiment <- 0
while there is sentiment_word in text do 
   polarity <- get_polarity(sentiment_word)
   overall_sentiment <- overall_sentiment + polarity
end while

你能帮我吗?

谢谢

---- ----编辑

基本算法更改为以下版本:

overall_sentiment <- 0
while there is sentiment_word in text do 
   polarity <- get_polarity(sentiment_word)
   if booster_word in context(sentiment_word)
     if negation_word in context(sentiment_word)
       polarity <- polarity/3
     else 
       polarity <- polarity*3
     end if
   end if
  overall_sentiment <- overall_sentiment + polarity
end while

booster_word <- c("more", "very", "too", "much", "completely", "absolutely", "fully", "totally", "definitely", "extremely", "often", "frequently", "enough", "a lot")
negation_word <-c("never", "nothing", "no", "never", "not", "no more")

我做了一个提取sentiment_word上下文的函数(一个特定单词前3个单词的样本)。

getContext <- function(text, look_for, pre = 3, post=pre) {
  # create vector of words (anything separated by a space)
  t_vec <- unlist(strsplit(text, '\\s'))

  # find position of matches
  matches <- which(t_vec==look_for)

  # return words before & after if any matches
  if(length(matches) > 0) {
    out <- 
      list(before = ifelse(m-pre < 1, NA, 
                           sapply(matches, function(m) t_vec[(m - pre):(m - 1)])), )

    return(out)
  } else {
    warning('No matches')
  }
}

以下是一个例子:

"the course was very interesting, but the professor was too boring."
"Stackoverflow is an intersting place with too interesting people"

第一句:

"the course was *very interesting*, but the professor was *too boring*."
 (1*3) + (-1*3) = 0

借口句:

"Stackoverflow is an *intersting* place with *too interesting* people"
 1+(1*3) = 4

我现在的问题是如何验证id的上下文是否在带有R的booster_word中? 好吗?

谢谢

2 个答案:

答案 0 :(得分:2)

也许这对你有用:

### function to calculate the polarity of sentences
calcPolarity <- function(sentiment_DF,sentences){

  # separate each sentence in words using regular expression 
  # (it returns a list with the words of each sentence)
  sentencesSplitInWords <- regmatches(sentences,gregexpr("[[:word:]]+",sentences,perl=TRUE))

  # pre-allocate the polarity result vector with size = number of sentences
  polarity <- rep.int(0,length(sentencesSplitInWords))

  for(i in 1:length(polarity)){
    # get the i-th sentence words
    wordsOfASentence <- sentencesSplitInWords[[i]]

    # get the rows of sentiment_DF corresponding to the words in the sentence using match
    # N.B. if a word occurs twice, there will be two equal rows 
    # (but I think it's correct since in this way you count its polarity twice)
    subDF <- sentiment_DF[match(wordsOfASentence,sentiment_DF$word,nomatch = 0),]

    # calculate the total polarity of the sentence and store in the vector
    polarity[i] <- sum(subDF$positive.polarity) - sum(subDF$negative.polarity)
  }
  return(polarity)
}

用法:

sentiment_DF <- data.frame(word=c('interesting','boring','pretty'),
                           positive.polarity=c(1,0,1),
                           negative.polarity=c(0,1,0))
sentences <- c("The course was interesting, but the professor was so boring.",
               "stackoverflow is an interesting place with interesting people!")
result <- calcPolarity(sentiment_DF,sentences)

# > result
# [1] 0 2

答案 1 :(得分:0)

你应该首先提取单词。 (可能使用正则表达式,以确保你没有得到像“有趣”这样的词。将句子的单词存储在一个名为:words_of_sentence的变量中。 然后你可以使用:

dt[word %in% words_of_sentence & value > 0, variable]