我正在尝试重现《用R进行文本挖掘》这本书的4.1.3节中的图4.3。 sentiment analysis
本节试图将所有二元组按四个否定单词“ not”,“ no”,“ never”和“ without”分组,并针对每个分组绘制情感贡献(仅按“否定词,表示对该书的错误贡献)。
因此,我将把单词绘制为y轴,将贡献作为x轴,并且为了使图看起来更好,我还希望每个组的条形按降序排列。因此,与前面的部分类似,我使用贡献值重新排列单词的级别。
但是这里的问题是,在每个组中,这些词将有不同的贡献。例如,在第1组中,“快乐”比“希望”出现的多,因此它的贡献更大,但是在第2组中,相反。更糟糕的是,当数据帧为mutate(word2 = reorder(word2, contribution))
时,我无法group_by(word1)
。
这本书能够很好地制作出应该的图,所以我想有一些方法可以根据不同的群体对水平进行重新排序。
下面是代码,#preparing the data for plotting
之前的所有内容都是从书中摘录的,因此应该没有问题,代码是我的。
library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr)
#getting bigrams
austen_bigrams <- austen_books() %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
bigrams_separated <- austen_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
#four negation words to look at
negation_words <- c("not", "no", "never", "without")
AFINN <- get_sentiments("afinn")
#get the sentiment score of words preceded by the four negation words
negated_words <- bigrams_separated %>%
filter(word1 %in% negation_words) %>% #word1 as negation words
inner_join(AFINN, by = c(word2 = "word")) %>% #word2 as the word following negation words
count(word1, word2, score, sort = TRUE) %>%
ungroup()
#preparing the data for plotting
bigrams_plot <- bigrams_separated %>%
filter(word1 %in% negation_words) %>%
inner_join(AFINN, by = c(word2 = "word")) %>% #getting sentiment score
count(word1, word2, score, sort = TRUE) %>%
mutate(contribution = n * score) %>% #defining contribution as n*score
group_by(word1) %>% #group by negation words
top_n(12,abs(contribution)) %>%
arrange(desc(abs(contribution))) %>%
ungroup() %>%
mutate(word2 = reorder(word2, contribution))
#plotting sentiment score contribution grouped by the four negation words
ggplot(bigrams_plot, aes(word2, n * score, fill = n * score > 0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~word1, ncol = 2, scales = "free") +
coord_flip()
我在下面创建了一个简单的版本:
v1_grp <- c(rep('A',10),rep('B',10))
v2_Aterm <- sample(letters[1:10],10,replace=F)
v2_Bterm <- sample(letters[1:10],10,replace=F)
v3_score <- sample(-10:10,20,replace=T)
data1 <- data_frame(grp=v1_grp,term=c(v2_Aterm,v2_Bterm),score=v3_score)
dataplot <- data1 %>%
arrange(desc(score)) %>%
mutate(term=reorder(term,score))
ggplot(dataplot, aes(term,score,fill=score>0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~grp, ncol = 2, scales = "free") +
coord_flip()
答案 0 :(得分:1)
(改编自https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets)
dataplot <- data1 %>%
arrange(grp, score) %>%
mutate(order = row_number())
ggplot(dataplot, aes(order,score,fill=score>0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~grp, ncol = 2, scales = "free") +
coord_flip() +
scale_x_continuous(
breaks = dataplot$order,
labels = dataplot$term,
expand = c(0,0)
)