使用ggplot2在堆积条形图中对分类数据进行排序

时间:2011-08-22 16:21:51

标签: r ggplot2 geom-bar

我有一个包含以下条目的矩阵:

dput(MilDis[1:200,])
structure(list(hhDomMil = c("HED", "ETB", "HED", "ETB", "PER", 
"BUM", "EXP", "TRA", "TRA", "PMA", "MAT", "MAT", "KON", "ETB", 
"PMA", "PMA", "HED", "BUM", "BUM", "HED", "PMA", "PMA", "HED", 
"TRA", "BUM", "EXP", "BUM", "PMA", "ETB", "MAT", "ETB", "ETB", 
"KON", "MAT", "TRA", "BUM", "BUM", "TRA", "TRA", "PMA", "PMA", 
"PMA", "MAT", "ETB", "TRA", "BUM", "TRA", "MAT", "BUM", "ETB", 
"TRA", "TRA", "BUM", "KON", "ETB", "ETB", "ETB", "BUM", "KON", 
"ETB", "ETB", "PMA", "TRA", "PER", "PER", "MAT", "HED", "KON", 
"TRA", "TRA", "TRA", "EXP", "TRA", "BUM", "MAT", "MAT", "TRA", 
"PMA", "HED", "PER", "TRA", "PER", "EXP", "PER", "BUM", "KON", 
"BUM", "ETB", "ETB", "TRA", "PER", "ETB", "KON", "KON", "BUM", 
"ETB", "BUM", "MAT", "BUM", "KON", "KON", "ETB", "MAT", "KON", 
"PER", "ETB", "ETB", "KON", "PMA", "PER", "HED", "HED", "PMA", 
"MAT", "PMA", "PER", "PMA", "TRA", "TRA", "MAT", "BUM", "BUM", 
"KON", "ETB", "ETB", "ETB", "PMA", "TRA", "TRA", "PMA", "PER", 
"KON", "PER", "BUM", "KON", "ETB", "ETB", "BUM", "TRA", "ETB", 
"PMA", "HED", "MAT", "TRA", "BUM", "PMA", "BUM", "ETB", "TRA", 
"TRA", "TRA", "PER", "EXP", "HED", "BUM", "EXP", "HED", "BUM", 
"MAT", "DDR", "BUM", "MAT", "KON", "HED", "HED", "TRA", "BUM", 
"PMA", "PMA", "PMA", "KON", "KON", "MAT", "ETB", "MAT", "TRA", 
"MAT", "ETB", "ETB", "TRA", "MAT", "ETB", "TRA", "HED", "BUM", 
"MAT", "TRA", "PMA", "BUM", "BUM", "EXP", "ETB", "EXP", "EXP", 
"MAT", "TRA", "KON", "BUM", "BUM", "HED"), kclust = c(1L, 2L, 
15L, 4L, 5L, 6L, 5L, 7L, 8L, 5L, 6L, 5L, 11L, 6L, 5L, 1L, 9L, 
10L, 2L, 1L, 9L, 8L, 4L, 11L, 14L, 5L, 8L, 11L, 12L, 5L, 5L, 
14L, 15L, 2L, 10L, 6L, 8L, 4L, 6L, 8L, 14L, 14L, 16L, 10L, 5L, 
1L, 12L, 17L, 12L, 16L, 16L, 5L, 10L, 14L, 8L, 19L, 5L, 4L, 4L, 
14L, 2L, 14L, 9L, 7L, 1L, 14L, 4L, 15L, 18L, 16L, 9L, 14L, 6L, 
14L, 12L, 11L, 4L, 7L, 8L, 12L, 9L, 16L, 2L, 6L, 15L, 1L, 1L, 
3L, 14L, 5L, 5L, 9L, 14L, 6L, 5L, 14L, 15L, 2L, 14L, 2L, 1L, 
8L, 5L, 10L, 1L, 1L, 16L, 5L, 2L, 9L, 9L, 1L, 12L, 10L, 1L, 4L, 
1L, 9L, 8L, 8L, 5L, 10L, 1L, 10L, 2L, 6L, 15L, 2L, 2L, 10L, 5L, 
6L, 10L, 19L, 19L, 6L, 5L, 6L, 7L, 7L, 8L, 5L, 16L, 5L, 6L, 6L, 
1L, 10L, 12L, 4L, 7L, 19L, 7L, 8L, 16L, 10L, 5L, 16L, 12L, 7L, 
7L, 19L, 4L, 6L, 1L, 15L, 7L, 8L, 16L, 4L, 10L, 15L, 11L, 10L, 
1L, 10L, 17L, 1L, 2L, 1L, 14L, 8L, 8L, 14L, 10L, 8L, 6L, 6L, 
8L, 5L, 7L, 5L, 1L, 5L, 7L, 9L, 2L, 1L, 9L, 14L), order = c(9, 
1, 9, 1, 3, 7, 10, 5, 5, 2, 8, 8, 4, 1, 2, 2, 9, 7, 7, 9, 2, 
2, 9, 5, 7, 10, 7, 2, 1, 8, 1, 1, 4, 8, 5, 7, 7, 5, 5, 2, 2, 
2, 8, 1, 5, 7, 5, 8, 7, 1, 5, 5, 7, 4, 1, 1, 1, 7, 4, 1, 1, 2, 
5, 3, 3, 8, 9, 4, 5, 5, 5, 10, 5, 7, 8, 8, 5, 2, 9, 3, 5, 3, 
10, 3, 7, 4, 7, 1, 1, 5, 3, 1, 4, 4, 7, 1, 7, 8, 7, 4, 4, 1, 
8, 4, 3, 1, 1, 4, 2, 3, 9, 9, 2, 8, 2, 3, 2, 5, 5, 8, 7, 7, 4, 
1, 1, 1, 2, 5, 5, 2, 3, 4, 3, 7, 4, 1, 1, 7, 5, 1, 2, 9, 8, 5, 
7, 2, 7, 1, 5, 5, 5, 3, 10, 9, 7, 10, 9, 7, 8, 6, 7, 8, 4, 9, 
9, 5, 7, 2, 2, 2, 4, 4, 8, 1, 8, 5, 8, 1, 1, 5, 8, 1, 5, 9, 7, 
8, 5, 2, 7, 7, 10, 1, 10, 10, 8, 5, 4, 7, 7, 9)), .Names = c("hhDomMil", 
"kclust", "order"), row.names = c(NA, 200L), class = "data.frame")

我想创建一个像Barplot这样的堆积条形图。

唯一的问题是,我希望堆栈的顺序适合这个(ETB,PMA,PER,KON,TRA,DDR,BUM,MAT,HED,EXP) - 矩阵中的订单号我也有一些美学问题。我在这里搜索了一个解决方案,但没有一个订购建议对我有用......: - \

  1. 如何绘制这样有序的情节?
  2. 如何设置x以使每个条形码“打开”一个数字?
  3. 我如何分隔酒吧 - 在这里我试着用白色边框......?
  4. 如何在x?
  5. 中打印所有kclust数字

    非常感谢你的帮助! 多米尼克


    更新

    以下是我用来绘制情节的代码:

    mycols <- c('#FFFD00', '#97CB00', '#3168FF', '#FF0200', '#FB02FE', \
    '#CCFCCC', '#FE9900', '#98CBF8', '#00CCFF', '#00FD03') # Set milieu colors
    
    
    ggplot(MilDis) +
     geom_bar(aes(kclust, fill=factor(hhDomMil), \
     colour=mycols), position='fill', binwidth=1, colour='white') +
     scale_fill_manual(values = mycols)
    

    更新2:

    我就是这样做的:

        mycols <- c('#3168FF', '#00CCFF', '#98CBF8', '#CCFCCC', '#00FD03',\
       '#97CB00', '#FFFD00', '#FE9900', '#FB02FE', '#FF0200') # Set milieu colors
    
        ggplot(MilDis) +
          geom_bar(aes(factor(kclust), fill=reorder(hhDomMil,order)),\
          position='fill') +
          scale_fill_manual(values = mycols)
    

    结果如下:

    Image

    谢谢大家的帮助!

3 个答案:

答案 0 :(得分:12)

通过在将数据传递给ggplot()之前正确格式化数据,可以轻松解决这个问题。关键是要明确设置hhDomMil因子的级别。假设您的数据位于dat

dat <- transform(dat, hhDomMil = factor(hhDomMil,
                                        levels = c("ETB", "PMA", "PER", "KON",
                                                   "TRA", "DDR", "BUM", "MAT",
                                                   "HED", "EXP")))

hhDomMil作为到位的内容修复dat将级别设置为您想要的顺序:

> head(dat$hhDomMil)
[1] HED ETB HED ETB PER BUM
Levels: ETB PMA PER KON TRA DDR BUM MAT HED EXP

注意当R将hhDomMil强制转换为因子时,正在讨价还价:

> head(factor(as.character(dat$hhDomMil)))
[1] HED ETB HED ETB PER BUM
Levels: BUM DDR ETB EXP HED KON MAT PER PMA TRA

默认设置是按字母顺序对级别进行排序,这就是为什么情节会在您显示时出现的原因。

我能给出的最佳建议是首先正确格式化您的数据,然后再尝试绘制它 - 不要依赖自动或即时转换来实现这一点;不可避免地,它不会是你想要的。

答案 1 :(得分:12)

我看到您的数据框中有一个order列,我收集的是您的订单。因此你可以做到。

p0 = qplot(factor(kclust), fill = reorder(hhDomMil, order), position = 'fill', 
       data = df1)

以下是此代码中处理您的问题的元素

  1. 如何绘制这样有序的情节? reorder
  2. 如何设置x以使每个条形码“打开”一个数字? factor(kclust)
  3. 如何分隔酒吧?
  4. 如何在x中打印所有kclust数字? factor(kclust)
  5. 我记得你之前的一个问题,hhDomMil对应不同的群体,我怀疑你的排序是在分组之后。在这种情况下,您可能希望使用该信息来选择调色板,以便更轻松地遵循图形。这是一种方法。

    mycols = c(brewer.pal(3, 'Oranges'), brewer.pal(3, 'Greens'), 
               brewer.pal(2, 'Blues'), brewer.pal(2, 'PuRd'))
    
    p0 + scale_fill_manual(values = mycols)
    

    enter image description here

答案 2 :(得分:7)

如果你将hhDomMil重新定义为这样一个因素:

o<-c("ETB" "PMA" "PER" "KON" "TRA" "DDR" "BUM" "MAT" "HED" "EXP")
d$hh<-factor(d$hhDomMil,levels=o)

那么你的情节将按你喜欢的顺序排列:

ggplot(d,(aes(x=kclust, fill=hh))) +geom_bar(position="fill")