我正在使用nrc,bing和afinn词典来进行情感分析。
现在我想从这些词典中删除一些特定的单词,但我不知道该怎么做,因为词汇没有保存在我的环境中。
我的代码看起来像这样(以nrc为例):
MyTextFile %>%
inner_join(get_sentiments("nrc")) %>%
count(sentiment, sort = TRUE)
答案 0 :(得分:0)
以下两种方法(毫无疑问会有更多)。首先请注意> library(tidytext)
> library(dplyr)
> sentiments <- get_sentiments("nrc")
> sentiments
# A tibble: 13,901 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon fear
3 abandon negative
4 abandon sadness
5 abandoned anger
6 abandoned fear
... and so on
词典中有13901个单词:
> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon negative
3 abandon sadness
4 abandoned anger
5 abandoned negative
6 abandoned sadness
您可以过滤掉特定情绪类别中的所有字词(剩余字数较少,位于12425):
dropwords
或者您可以创建自己的> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant")
> sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords)
> sentiments
# A tibble: 13,884 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abba positive
3 abbot trust
4 aberration disgust
5 aberration negative
6 abhor anger
列表并将其从词典中删除(剩下的字数更少,位于13884):
sentiments
然后,您只需使用已创建的> library(gutenbergr)
> hgwells <- gutenberg_download(35) # loads "The Time Machine"
> hgwells %>% unnest_tokens(word,text) %>%
inner_join(sentiments) %>% count(word,sort=TRUE)
Joining, by = "word"
# A tibble: 1,077 x 2
word n
<chr> <int>
1 white 236
2 feeling 200
3 time 200
4 sun 145
5 found 132
6 darkness 108
进行情感分析:
class MockCardService extends CardService {
希望这有所帮助。
答案 1 :(得分:0)
如果你可以制作一个你想删除的单词数据框,你可以使用anti_join排除这些:
word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)
MyTextFile %>%
inner_join(get_sentiments("nrc")) %>%
anti_join(words_to_remove) %>%
count(sentiment, sort = TRUE)