R中分组箱图的平均值

时间:2017-04-12 06:52:11

标签: r ggplot2 boxplot

我想从我的数据中创建分组的箱图:

X     variable value
Cat1  Var1     10
Cat2  Var1     8
Cat3  Var1     7
Cat4  Var1     15
Cat1  Var2     4
Cat2  Var2     3
Cat3  Var2     4
Cat4  Var2     1

我能够通过以下方式检索它:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))

现在我想添加额外的点,这些点将显示每个箱图的平均值(平均值)。我试过了:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))+
    geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18)

但它在同一X位置显示4个值(下图中的红点)enter image description here

我尝试使用facet_wrap,但我无法纠正错误:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))+
    geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18) +
    facet_wrap(~means$`dataFiltered$variable`, scales='free')

Error in layout_base... At least one layer must contain all variables used for facetting.

有没有办法在分组的箱形图上加上平均值?

1 个答案:

答案 0 :(得分:2)

尝试添加stat_summary()来电:

library(dplyr)
library(tidyr)
library(ggplot2)
df <- bind_rows(lapply(c(
  "Cat1  Var1     10",
  "Cat2  Var1     8",
  "Cat3  Var1     7",
  "Cat4  Var1     15",
  "Cat1  Var2     4",
  "Cat2  Var2     3",
  "Cat3  Var2     4",
  "Cat4  Var2     1"), data.frame))
colnames(df) <- "V1"
df2 <- df %>%
        separate(V1, c("X", "variable", "value"), sep="\\s+") %>%
        mutate(value = as.integer(value))

ggplot(df2, aes(x=X, y=value, color=variable)) +
        geom_boxplot()+
        ylim(c(-5, 15)) + 
        stat_summary(geom = "point", fun.y = "mean", colour = "red", size = 4)

如果您想要每个组,请尝试以下方法:

ggplot(df2, aes(x=X, y=value, color=variable)) +
        geom_boxplot()+
        ylim(c(-5, 15)) +
        stat_summary(geom = "point", aes(group=variable, col=variable), 
            fun.y = "mean", size = 4, position=position_dodge(width=0.5))

当样本量较小时,这些图可能会产生误导。