如何在箱线图上绘制加权平均值?

时间:2013-04-24 23:58:06

标签: r ggplot2 boxplot

在寻找解决方案和摆弄之后,我正在寻求帮助以尝试在箱线图上显示加权平均值(我也尝试将其交叉发布到ggplot2邮件列表中)。

我在下面提供了一个玩具示例。

#data

value <- c(5, 7, 8, 6, 7, 9, 10, 6, 7, 10)
category <- c("one", "one", "one", "two", "two", "two",
              "three", "three", "three","three")
weight <- c(1, 1.2, 2, 3, 2.2, 2.5, 1.8, 1.9, 2.2, 1.5)
df <- data.frame(value, category, weight)

#unweighted means by category
ddply(df, .(category), summarize, mean=round(mean(value, na.rm=TRUE), 2))

  category mean
1      one 6.67
2    three 8.25
3      two 7.33

#weighted means by category
ddply(df, .(category), summarize, 
          wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2))

  category wmean
1      one  7.00
2    three  8.08
3      two  7.26

#unweighted means added to boxplot (which works fine)
ggplot(df, aes(x = category, y = value, weight = weight)) + 
   geom_boxplot(width=0.6,  colour = I("#3366FF")) + 
   stat_summary( fun.y ="mean", geom ="point", shape = 23, 
                 size = 3, fill ="white") 

我的问题是,如何在箱线图上显示加权平均值而非未加权平均值?

1 个答案:

答案 0 :(得分:4)

您可以将加权平均值保存为新数据框,然后使用它来绘制geom_point()。参数inherit.aes=FALSE将确保在不继承ggplot()调用中提供的信息的情况下绘制点。

library(Hmisc)
library(plyr)
library(ggplot2)
df.wm<-ddply(df, .(category), summarize, 
             wmean=round(wtd.mean(value, weight, na.rm=TRUE), 2))

ggplot(df, aes(x = category, y = value, weight = weight)) + 
  geom_boxplot(width=0.6,  colour = I("#3366FF")) + 
  geom_point(data=df.wm,aes(x=category,y=wmean),shape = 23, 
             size = 3, fill ="white",inherit.aes=FALSE)

enter image description here