我想将Tukey.HSD事后测试的结果添加到ggplot2
箱图。 This SO answer包含我想要的手动示例(即,绘图上的字母是手动添加的;共享字母的组是难以区分的,p>等等)。
是否有自动功能根据AOV和Tukey HSD事后分析将这些字母添加到箱线图中?
我认为编写这样的函数并不难。它看起来像这样:
set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)
p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot()
a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a)
# Function to generate a data frame of factor levels and corresponding labels
generate_label_df <- function(HSD, factor_levels) {
comparisons <- rownames(HSD$l)
p.vals <- HSD$l[ , "p adj"]
## Somehow create a vector of letters
labels <- # A vector of letters, one for each factor level, generated using `comparisons` and `p.vals`
letter_df <- data.frame(lev=factor_levels, labels=labels)
letter_df
}
# Add the labels to the plot
p_base +
geom_text(data=generate_label_df(tHSD), aes(x=l, y=0, label=labels))
我意识到TukeyHSD
对象有一个plot
方法,还有另一个包(我现在似乎无法找到它),它完成了我在基础图形中所描述的内容,但是我更愿意在ggplot2
中执行此操作。
答案 0 :(得分:12)
您可以使用'multcompView'包中的'multcompLetters'在Tukey HSD测试后生成同源组的字母。从那里开始,需要提取与Tukey HSD中测试的每个因子相对应的组标签,以及框图中显示的上部分位数,以便将标签放置在该水平之上。
library(plyr)
library(ggplot2)
library(multcompView)
set.seed(0)
lev <- gl(3, 10)
y <- c(rnorm(10), rnorm(10) + 0.1, rnorm(10) + 3)
d <- data.frame(lev=lev, y=y)
a <- aov(y~lev, data=d)
tHSD <- TukeyHSD(a, ordered = FALSE, conf.level = 0.95)
generate_label_df <- function(HSD, flev){
# Extract labels and factor levels from Tukey post-hoc
Tukey.levels <- HSD[[flev]][,4]
Tukey.labels <- multcompLetters(Tukey.levels)['Letters']
plot.labels <- names(Tukey.labels[['Letters']])
# Get highest quantile for Tukey's 5 number summary and add a bit of space to buffer between
# upper quantile and label placement
boxplot.df <- ddply(d, flev, function (x) max(fivenum(x$y)) + 0.2)
# Create a data frame out of the factor levels and Tukey's homogenous group letters
plot.levels <- data.frame(plot.labels, labels = Tukey.labels[['Letters']],
stringsAsFactors = FALSE)
# Merge it with the labels
labels.df <- merge(plot.levels, boxplot.df, by.x = 'plot.labels', by.y = flev, sort = FALSE)
return(labels.df)
}
生成ggplot
p_base <- ggplot(d, aes(x=lev, y=y)) + geom_boxplot() +
geom_text(data = generate_label_df(tHSD, 'lev'), aes(x = plot.labels, y = V1, label = labels))