具有翻转坐标和多面图的ggplot排序轴

时间:2018-09-07 02:22:04

标签: r ggplot2 lda

我有一个像这样的数据集(LDA输出)。

lda_tt <- tidy(ldaOut)

lda_tt <- lda_tt %>%
        group_by(topic) %>%
        top_n(10, beta) %>%
        ungroup() %>%
        arrange(topic, -beta)

    topic   term    beta
1   1   council 0.044069733
2   1   report  0.020086205
3   1   budget  0.016918569
4   1   polici  0.01646605
5   1   term    0.015051927
6   1   annual  0.014938797
7   1   control 0.014316583
8   1   audit   0.013637803
9   1   rate    0.012732765
10  1   fund    0.011997421
11  2   debt    0.033760856
12  2   plan    0.030379431
13  2   term    0.02925229
14  2   fiscal  0.021836885
15  2   polici  0.017802904
16  2   mayor   0.015548621
17  2   transpar0.013175692
18  2   relat   0.012997722
19  2   capit   0.012463813
20  2   long    0.011989227
21  2   remain  0.011989227
22  3   parti   0.031795751
23  3   elect   0.029929187
24  3   govern  0.025496098
25  3   mayor   0.023046232
26  3   district0.014588364
27  3   public  0.014471704
28  3   administr0.013596752
29  3   budget  0.011730188
30  3   polit   0.011730188
31  3   seat    0.010563586
32  3   state   0.010563586
33  4   budget  0.037069484
34  4   revenu  0.025043026
35  4   account 0.018459577
36  4   oper    0.01721546
37  4   tax     0.015867667
38  4   debt    0.014416198
39  4   compani 0.013690464
40  4   expenditur0.012135318
41  4   consolid0.011305907
42  4   increas 0.010891202
43  5   invest  0.026534237
44  5   elect   0.023341538
45  5   administr0.022296654
46  5   improv  0.02189031
47  5   develop 0.019162003
48  5   project 0.017826874
49  5   transport0.016375647
50  5   local   0.016317598
51  5   infrastr0.014401978
52  5   servic  0.014111733

我想按主题创建5个图,其用beta排序。这是代码

    lda_tt %>%
        mutate(term = reorder(term, beta)) %>%
        ggplot(aes(term, beta, fill = factor(topic))) +
        geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
        facet_wrap(~ topic, scales = "free") +
        coord_flip()

我得到这张图Terms by beta 如您所见,尽管进行了排序,但是这些术语并不是按beta排序的,例如,术语“预算”应该是主题4的顶部,而“投资”应该是主题5的顶部,依此类推。如何在每个图表的每个主题中对术语进行排序?关于ggplot排序,有一些关于stackoverflow的问题,但是没有一个问题可以帮助我解决问题。

1 个答案:

答案 0 :(得分:1)

link建议的Tung提供了解决该问题的方法。似乎每个词都需要作为独特的因素进行编码才能得到正确的排序。我们可以在每个术语中添加“ _”和主题编号(在第2行和第3行中完成),但仅显示不包含“ _”和主题编号的术语(代码的最后一行负责此操作)。以下代码生成具有适当排序的多面图。

    lda_tt %>%

        mutate(term = factor(paste(term, topic, sep = "_"),
                             levels = rev(paste(term, topic, sep = "_")))) %>%#convert to factor

        ggplot(aes(term, beta, fill = factor(topic))) +
        geom_bar(alpha = 0.8, stat = "identity", show.legend = FALSE) +
        facet_wrap(~ topic, scales = "free") +
        coord_flip() + 

        scale_x_discrete(labels = function(x) gsub("_.+$", "", x)) #remove "_" and topic number