我有一个像这样的数据集(LDA输出)。
lda_tt <- tidy(ldaOut)
lda_tt <- lda_tt %>%
group_by(topic) %>%
top_n(10, beta) %>%
ungroup() %>%
arrange(topic, -beta)
topic term beta
1 1 council 0.044069733
2 1 report 0.020086205
3 1 budget 0.016918569
4 1 polici 0.01646605
5 1 term 0.015051927
6 1 annual 0.014938797
7 1 control 0.014316583
8 1 audit 0.013637803
9 1 rate 0.012732765
10 1 fund 0.011997421
11 2 debt 0.033760856
12 2 plan 0.030379431
13 2 term 0.02925229
14 2 fiscal 0.021836885
15 2 polici 0.017802904
16 2 mayor 0.015548621
17 2 transpar0.013175692
18 2 relat 0.012997722
19 2 capit 0.012463813
20 2 long 0.011989227
21 2 remain 0.011989227
22 3 parti 0.031795751
23 3 elect 0.029929187
24 3 govern 0.025496098
25 3 mayor 0.023046232
26 3 district0.014588364
27 3 public 0.014471704
28 3 administr0.013596752
29 3 budget 0.011730188
30 3 polit 0.011730188
31 3 seat 0.010563586
32 3 state 0.010563586
33 4 budget 0.037069484
34 4 revenu 0.025043026
35 4 account 0.018459577
36 4 oper 0.01721546
37 4 tax 0.015867667
38 4 debt 0.014416198
39 4 compani 0.013690464
40 4 expenditur0.012135318
41 4 consolid0.011305907
42 4 increas 0.010891202
43 5 invest 0.026534237
44 5 elect 0.023341538
45 5 administr0.022296654
46 5 improv 0.02189031
47 5 develop 0.019162003
48 5 project 0.017826874
49 5 transport0.016375647
50 5 local 0.016317598
51 5 infrastr0.014401978
52 5 servic 0.014111733
我想按主题创建5个图,其用beta
排序。这是代码
lda_tt %>%
mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
我得到这张图 如您所见,尽管进行了排序,但是这些术语并不是按beta排序的,例如,术语“预算”应该是主题4的顶部,而“投资”应该是主题5的顶部,依此类推。如何在每个图表的每个主题中对术语进行排序?关于ggplot排序,有一些关于stackoverflow的问题,但是没有一个问题可以帮助我解决问题。
答案 0 :(得分:1)
link建议的Tung提供了解决该问题的方法。似乎每个词都需要作为独特的因素进行编码才能得到正确的排序。我们可以在每个术语中添加“ _”和主题编号(在第2行和第3行中完成),但仅显示不包含“ _”和主题编号的术语(代码的最后一行负责此操作)。以下代码生成具有适当排序的多面图。
lda_tt %>%
mutate(term = factor(paste(term, topic, sep = "_"),
levels = rev(paste(term, topic, sep = "_")))) %>%#convert to factor
ggplot(aes(term, beta, fill = factor(topic))) +
geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip() +
scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) #remove "_" and topic number