提取数据框每一行的情绪计算

时间:2018-09-30 08:41:46

标签: r text-mining tidyr sentiment-analysis

我有一个带有文本行的数据框。我想为每一行文本提取一个特定情感的矢量,该矢量将是二进制0,但该情感不存在或存在1。
总共它们是5种情感,但我想仅将1用于似乎是最令人激动的。

我尝试过的例子:

library(tidytext)
text = data.frame(id = c(11,12,13), text=c("bad movie","good movie","I think it would benefit religious people to see things like this, not just to learn about our home, the Universe, in a fun and easy way, but also to understand that non- religious explanations don't leave people hopeless and",))
nrc_lexicon <- get_sentiments("nrc")

预期输出示例:

    id text sadness anger joy love neutral
11 "bad movie" 1 0 0 0 0
12 "good movie" 0 0 1 0 0 

任何提示对我都会有帮助。

示例,使每一行的下一步是什么?
如何使用nrc词典分析调用每一行?

for (i in 1:nrow(text)) {
(text$text[i], nrc_lexicon)
}

1 个答案:

答案 0 :(得分:1)

那呢:

library(tidytext)   # library for text
library(dplyr)

# your data
text <- data.frame(id = c(11,12,13),
 text=c("bad movie","good movie","I think it would benefit religious
 people to see things like this, not just to learn about our home, 
the Universe, in a fun and easy way, but also to understand that non- religious
 explanations don't leave people hopeless and"), stringsAsFactors = FALSE)  # here put this option, stringAsFactors = FALSE!

# the lexicon
nrc_lexicon <- get_sentiments("nrc")

# now the job
unnested <- text %>%
             unnest_tokens(word, text) %>%  # unnest the words
             left_join(nrc_lexicon) %>%     # join with the lexicon to have sentiments
             left_join(text)                # join with your data to have titles

这里的输出为id,您也可以将其与标题一起使用,但是由于第三个标题很长,因此我没有放置它,可以很容易地将其作为unnested$text放置在{ {1}}:

unnested$id

如果您希望使用table_sentiment <- table(unnested$id, unnested$sentiment) table_sentiment anger anticipation disgust fear joy negative positive sadness surprise trust 11 1 0 1 1 0 1 0 1 0 0 12 0 1 0 0 1 0 1 0 1 1 13 0 1 0 1 1 2 3 2 1 0

data.frame

现在您可以做任何您想做的事情,例如,如果我没记错的话,您想要二进制输出(如果存在或不存在):

 df_sentiment <- as.data.frame.matrix(table_sentiment)