geom_bar():根据总观测值绘制子组的频率

时间:2020-08-09 15:00:30

标签: r ggplot2 geom-bar

我对R比较陌生,想问一下: 我有一个包含2列的数据框(my.data):“ PHENO”是具有两个级别(1或2)的因子,而“ bins”是数字(1到10之间的自然数)。我正在尝试绘制PHENO == 2相对于垃圾箱的频率(以百分比表示),其中100%是总数观测值(级别1 + 2)。

这是我所做的,但100%并非所有观察结果:

ggplot(data = subset(my.data, PHENO == 2)) + 
  geom_bar(mapping = aes(x = as.factor(bins), y = ..prop.., group = 1), stat = "count") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0,0.15)) +
  geom_hline(yintercept = 0.05, linetype="dashed", color = 'blue', size = 1) + 
  annotate(geom = "text", label = 'Prevalence 5%', x = 1.5, y = 0.05, vjust = -1, col = 'blue') +

此外,我尝试在条形图上添加频率标签,但是它不起作用:

geom_text(aes(label = as.factor(bins)), position=position_dodge(width=0.9), vjust = -0.25)

感谢您的帮助。

1 个答案:

答案 0 :(得分:0)

这是您需要的吗?

df %>% 
  group_by(PHENO, bins) %>% 
  count(PHENO) %>% 
  ungroup() %>% 
  mutate(Percent=n/sum(n)*100) %>% 
  filter(PHENO=="2") %>% #select PHENO 2 here in order to keep 100% of all observations
  ggplot(aes(y=Percent, x=bins))+
  geom_col()+
  geom_hline(yintercept = 5, linetype="dashed", color = 'blue', size = 1)+
  geom_text(aes(label = as.factor(bins)), position=position_dodge(width=0.9), vjust = -0.25)

出于说明目的,我使用了可能与您的模拟数据不符的模拟数据:

df <- structure(list(PHENO = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 
2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", 
"2"), class = "factor"), bins = c(1, 2, 4, 5, 7, 8, 9, 5, 2, 
3, 6, 9, 10, 5, 6, 6, 6, 4)), class = "data.frame", row.names = c(NA, 
-18L))

结果:

bar_plot