在堆积条形图上按列显示百分比

时间:2017-04-19 06:00:15

标签: r plot ggplot2

我正在尝试绘制一个堆积条形图,显示一列中每个组的相对百分比。

这是我的问题的一个例子,使用默认的mpg数据集:

mpg %>%
  ggplot(aes(x=manufacturer, group=class)) +
  geom_bar(aes(fill=class), stat="count") +
  geom_text(aes(label=scales::percent(..prop..)),
    stat="count",
    position=position_stack(vjust=0.5))

这是输出: enter image description here

我的问题是此输出显示每个相对于总计的百分比,而不是每个制造商中的相对百分比。

例如,我希望第一列(audi)显示棕色(紧凑)的83.3%(15/18)和绿色(中型)的16.6%(3/18)。

我在这里发现了类似的问题: How to draw stacked bars in ggplot2 that show percentages based on group?

但是我想知道在ggplot2中是否有一种更简单的方法可以做到这一点,特别是因为我的实际数据集使用了一堆dplyr管道来按摩数据,然后最终将数据输送到ggplot2。

2 个答案:

答案 0 :(得分:3)

如果我将您的问题与您提供的链接进行比较,那么差异就是链接“计算”了他们自己。这就是我做的。我不确定这是否适合您的真实数据。

library(ggplot2)
library(dplyr)

mpg %>%
  mutate(manufacturer = as.factor(manufacturer),
         class = as.factor(class)) %>%
  group_by(manufacturer, class) %>%
  summarise(count_class = n()) %>%
  group_by(manufacturer) %>%
  mutate(count_man = sum(count_class)) %>%
  mutate(percent = count_class / count_man * 100) %>%
  ggplot() +
  geom_bar(aes(x = manufacturer,
               y = count_man, 
               group = class,
               fill = class), 
           stat = "identity") +
  geom_text(aes(x = manufacturer,
                y = count_man,
                label = sprintf("%0.1f%%", percent)),
            position = position_stack(vjust = 0.5))

根据评论进行修改:

我为y

选择了错误的列,我犯了一个错误
library(ggplot2)
library(dplyr)

mpg %>%
  mutate(manufacturer = as.factor(manufacturer),
         class = as.factor(class)) %>%
  group_by(manufacturer, class) %>%
  summarise(count_class = n()) %>%
  group_by(manufacturer) %>%
  mutate(count_man = sum(count_class)) %>%
  mutate(percent = count_class / count_man * 100) %>%
  ungroup() %>%
  ggplot(aes(x = manufacturer,
             y = count_class,
             group = class)) +
  geom_bar(aes(fill = class), 
           stat = "identity") +
  geom_text(aes(label = sprintf("%0.1f%%", percent)),
            position = position_stack(vjust = 0.5))

答案 1 :(得分:1)

如果情节需要数字和百分比作为彩色条形图顶部的文字,为了帮助我们看到差异,也许最好将结果显示为一个简单的表格:

round(prop.table(table(mpg$class, mpg$manufacturer), margin = 2), 3) * 100

#             audi chevrolet dodge  ford honda hyundai  jeep land rover lincoln mercury nissan pontiac subaru toyota volkswagen
# 2seater      0.0      26.3   0.0   0.0   0.0     0.0   0.0        0.0     0.0     0.0    0.0     0.0    0.0    0.0        0.0
# compact     83.3       0.0   0.0   0.0   0.0     0.0   0.0        0.0     0.0     0.0   15.4     0.0   28.6   35.3       51.9
# midsize     16.7      26.3   0.0   0.0   0.0    50.0   0.0        0.0     0.0     0.0   53.8   100.0    0.0   20.6       25.9
# minivan      0.0       0.0  29.7   0.0   0.0     0.0   0.0        0.0     0.0     0.0    0.0     0.0    0.0    0.0        0.0
# pickup       0.0       0.0  51.4  28.0   0.0     0.0   0.0        0.0     0.0     0.0    0.0     0.0    0.0   20.6        0.0
# subcompact   0.0       0.0   0.0  36.0 100.0    50.0   0.0        0.0     0.0     0.0    0.0     0.0   28.6    0.0       22.2
# suv          0.0      47.4  18.9  36.0   0.0     0.0 100.0      100.0   100.0   100.0   30.8     0.0   42.9   23.5        0.0