我正在尝试使用geom_boxplot和scale_y_log10按组创建一些对数正态数据的箱线图。当我用中位数和其他四分位数的实际值检查这些图时,我发现某些箱线图绘制错误,即中位数和/或其他四分位数不在正确的位置。经过大量测试,我意识到同时使用geom_boxplot和scale_y_log10时,具有两个值的组只会给出错误的图。
以下是测试示例的代码:
f1 <- function(x) {
log10(mean(10 ^ x))
}
library(ggplot2)
test1 <- data.frame("value" = c(3, 45, 2, 100),
"field" = c("a", "a", "a", "a"),
"group" = c("1", "1", "2", "2"))
test2 <- data.frame("value" = c(3, 10, 45, 2, 70, 100),
"field" = c("a", "a", "a", "a", "a", "a"),
"group" = c("1", "1", "1", "2", "2", "2"))
tapply(test1$value, test1$group, median, na.rm = TRUE)
tapply(test1$value, test1$group, mean, na.rm = TRUE)
ggplot(test1, aes(x=field, y=value)) + geom_boxplot() + facet_grid(~ group) +
geom_hline(yintercept = c(24, 51), colour="blue", linetype=2) +
geom_hline(yintercept = c(24, 51), colour="red", linetype=2) +
scale_y_log10() +
stat_summary(fun.y=f1, geom="point", shape=1, size=3, color="red",
fill="red") +
theme(legend.position="none") +
scale_fill_brewer(palette="Set3")
tapply(test2$value, test2$group, median, na.rm = TRUE)
tapply(test2$value, test2$group, mean, na.rm = TRUE)
ggplot(test2, aes(x=field, y=value)) + geom_boxplot() + facet_grid(~ group)+
geom_hline(yintercept = c(10, 70), colour="blue", linetype=2) +
geom_hline(yintercept = c(19.3, 57.3), colour="red", linetype=2) +
scale_y_log10() +
stat_summary(fun.y=f1, geom="point", shape=1, size=3, color="red",
fill="red") +
theme(legend.position="none") +
scale_fill_brewer(palette="Set3")
如您所见,如果您运行上述示例,则test1(每组两个值)会给出具有错误中位数的箱形图,而test2(每组三个值)会给出正确的箱形图。
有任何想法为什么会这样?