删除R中情感词典中的单词

时间:2018-04-14 00:24:13

标签: r sentiment-analysis

我正在使用nrc,bing和afinn词典来进行情感分析。

现在我想从这些词典中删除一些特定的单词,但我不知道该怎么做,因为词汇没有保存在我的环境中。

我的代码看起来像这样(以nrc为例):

MyTextFile %>%
  inner_join(get_sentiments("nrc")) %>%
  count(sentiment, sort = TRUE)

2 个答案:

答案 0 :(得分:0)

以下两种方法(毫无疑问会有更多)。首先请注意> library(tidytext) > library(dplyr) > sentiments <- get_sentiments("nrc") > sentiments # A tibble: 13,901 x 2 word sentiment <chr> <chr> 1 abacus trust 2 abandon fear 3 abandon negative 4 abandon sadness 5 abandoned anger 6 abandoned fear ... and so on 词典中有13901个单词:

> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2 
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     negative 
 3 abandon     sadness  
 4 abandoned   anger    
 5 abandoned   negative 
 6 abandoned   sadness  

您可以过滤掉特定情绪类别中的所有字词(剩余字数较少,位于12425):

dropwords

或者您可以创建自己的> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant") > sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords) > sentiments # A tibble: 13,884 x 2 word sentiment <chr> <chr> 1 abacus trust 2 abba positive 3 abbot trust 4 aberration disgust 5 aberration negative 6 abhor anger 列表并将其从词典中删除(剩下的字数更少,位于13884):

sentiments

然后,您只需使用已创建的> library(gutenbergr) > hgwells <- gutenberg_download(35) # loads "The Time Machine" > hgwells %>% unnest_tokens(word,text) %>% inner_join(sentiments) %>% count(word,sort=TRUE) Joining, by = "word" # A tibble: 1,077 x 2 word n <chr> <int> 1 white 236 2 feeling 200 3 time 200 4 sun 145 5 found 132 6 darkness 108 进行情感分析:

class MockCardService extends CardService {

希望这有所帮助。

答案 1 :(得分:0)

如果你可以制作一个你想删除的单词数据框,你可以使用anti_join排除这些:

word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)

MyTextFile %>%
  inner_join(get_sentiments("nrc")) %>%
  anti_join(words_to_remove) %>%
  count(sentiment, sort = TRUE)