ggplot2,带有自定义分位数和异常值的geom_boxplot

时间:2014-02-25 19:44:35

标签: r ggplot2

我有一个数据集,其中包括来自100个模拟列车运行的数据,这些模拟网络中有4列火车,6个站点和每个站点的每列火车到达时的迟到。我的数据看起来像这样:

MyData <- data.frame(
  Simulation = rep(sort(rep(1:100, 6)), 4),
  Train_number = sort(rep(c(100, 102, 104, 106), 100*6)), 
  Stations = rep(c("ST_1", "ST_2", "ST_3", "ST_4", "ST_5", "ST_6"), 100*4),
  Arrival_Lateness = c(rep(0, 60), rexp(40, 1), rep(0, 60), rexp(40, 2), rep(0, 60), rexp(40, 3), rep(0, 60), rexp(40, 5))
  )

我现在使用自定义分位数为每个火车和火车站创建箱图(感谢jlhoward):

f <- function(x) {
  r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

ggplot(MyData, aes(factor(Stations), Arrival_Lateness, fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot", position="dodge")

很漂亮: enter image description here

我现在缺少的是异常值。我想在每个箱图的汤姆上绘制每个火车/车站组合的前5%的观测值。我尝试的是这个(受this question启发):

q <- function(x) {
  subset(x, quantile(x, 0.95) < x)
}

ggplot(MyData, aes(factor(Stations), Arrival_Lateness, fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot", position="dodge") + 
  stat_summary(fun.y = q, geom="point", position="dodge")

我收到一条消息:“ymax未定义:使用y调整位置”,我的图表如下所示:

enter image description here

这显然不是我想要的。

1 个答案:

答案 0 :(得分:6)

此?

ggplot(MyData, aes(factor(Stations), Arrival_Lateness, 
                   fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot", 
               position=position_dodge(1))+
  stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point", 
               position=position_dodge(1))

恕我直言,这有点容易理解。

ggplot(MyData, aes(factor(Train_number), Arrival_Lateness, 
               fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot",
               position=position_dodge(1))+
  stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point", 
               position=position_dodge(1))+
  facet_grid(.~Stations, scales="free")+
  theme(axis.text.x=element_text(angle=-90,hjust=1,vjust=0.2))+
  labs(x="Train Number")

编辑(对OP&#39评论的回应)

ggplot(MyData, aes(factor(Train_number), Arrival_Lateness, 
                   fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot",
               position=position_dodge(1))+
  stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point", 
               position=position_dodge(1))+
  facet_grid(.~Stations, scales="free")+
  theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())+
  scale_fill_discrete("Train")+scale_color_discrete("Train")+
  labs(x="")

要关闭x轴文字和刻度线,我们theme(...=element_blank())。要关闭轴标签,请使用labs(x="")。此外,填充和颜色标度必须具有相同的名称,或者它们单独显示。