条形图的大小和百分比不匹配

时间:2019-01-23 08:06:42

标签: r ggplot2

我想根据客户的性别,学历和默认付款状态来绘制其详细信息。但是other类别的图形显示的大小大于其余条形。

#个数据链接“ https://archive.ics.uci.edu/ml/machine-learning-databases/00350/

plot_data5 <- customer.data %>% 
  group_by(EDUCATION,SEX) %>% 
  mutate(group_size = n()) %>%
  group_by(EDUCATION,SEX, DEFAULT_PAYMENT) %>%
  summarise(perc = paste(round(n()*100/max(group_size), digits = 2), 
  "%", sep = ""))


ggplot(plot_data5, aes(x = plot_data5$EDUCATION, y = plot_data5$perc, fill = DEFAULT_PAYMENT))+
  geom_bar(stat = "identity") + 
  geom_text(aes(label = plot_data5$perc),vjust=-.3) +
  facet_wrap(DEFAULT_PAYMENT~SEX,scales = "free") +
  theme(plot.subtitle = element_text(vjust = 1), 
        plot.caption = element_text(vjust = 1)) + 
  labs(y = "% of Customer ") + 
  labs(x = "Default_Payment")

实际结果应仅为此类,但条形的真实大小和连续的y轴比例。

1 个答案:

答案 0 :(得分:1)

无需在aes的{​​{1}}调用中再次指定正在使用的数据框。这样会妨碍标签的正确分配。此外,由于要具有连续的y轴,因此需要将ggplot作为连续变量。

perc

enter image description here 我发现数据表示极具误导性!尽管x轴显示plot_data <- customer.data.small %>% group_by(EDUCATION, SEX) %>% mutate(group_size = n()) %>% group_by(EDUCATION, SEX, DEFAULT_PAYMENT) %>% summarise(perc = n()/max(group_size)) # Keep perc continuous ggplot(plot_data, aes(x = EDUCATION, y = perc, fill = DEFAULT_PAYMENT)) + geom_bar(stat = "identity") + # Specify the labels with % and rounded in aes directly: geom_text(aes(label = paste0(round(100*perc, 2), "%")), vjust = -.3) + facet_wrap(DEFAULT_PAYMENT ~ SEX, scales = "free_y") + # Use scales::percent to have percentages on the y-axis. # Expand makes sure you can still read the labels scale_y_continuous(labels = scales::percent, expand = c(0.075, 0)) + theme(plot.subtitle = element_text(vjust = 1), plot.caption = element_text(vjust = 1)) + labs(y = "% of Customer ") + labs(x = "Default_Payment") ,但仍将其标记为“ Default_Payment”。从图中尚不清楚为什么每个分组的百分比之和不等于100%,这会使读者感到困惑。这是关于如何改善情节的建议:

EDUCATION

enter image description here

数据
我使用了您提供的一小部分数据,这些数据以可复制的格式提供,每个人都可以复制并粘贴到自己的R会话中,而无需下载数据集。

plot_data2 <- customer.data.small %>% 
  mutate_at(c("DEFAULT_PAYMENT", "EDUCATION", "SEX"), factor) %>% 
  group_by(EDUCATION, SEX) %>% 
  mutate(group_size = n()) %>%
  group_by(EDUCATION, SEX, DEFAULT_PAYMENT) %>%
  summarise(perc = n()/max(group_size))

ggplot(plot_data2, aes(x = EDUCATION, y = perc, fill = DEFAULT_PAYMENT)) +
  geom_bar(stat = "identity", 
           position = position_dodge2(width = 0.9, preserve = "single")) +
  geom_text(aes(label = paste0(round(100 * perc, 2), "%")),
            vjust = -.3,
            position = position_dodge(0.9)) +
  facet_wrap( ~ SEX, labeller = label_both) +
  scale_y_continuous(labels = scales::percent) +
  theme(plot.subtitle = element_text(vjust = 1),
        plot.caption = element_text(vjust = 1)) +
  labs(y = "% of Customer ") +
  labs(x = "Education")

这是我创建数据的方式:

customer.data.small <- 
  structure(list(ID = 1:100, 
                 EDUCATION = c(2, 2, 2, 2, 2, 1, 1, 2, 3, 3, 3, 1, 2, 2, 1, 3, 1, 1, 1, 1, 3, 2, 2, 1, 1, 3, 1, 3, 3, 1, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 5, 2, 1, 3, 3, 2, 1, 1, 1, 3, 2, 1, 2, 3, 2, 1, 2, 2, 1, 2, 1, 3, 5, 1, 2, 2, 1, 1, 2, 3, 1, 2, 2, 3, 1, 3, 2, 3, 2, 1, 2, 1, 3, 1, 1, 1, 2, 2, 2, 1, 1, 3, 2), 
                 SEX = c(2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 1), 
                 DEFAULT_PAYMENT = c(1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1)), 
            row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"))