我们正在为生物学学生教授统计课程,并尝试将R用作计算和数据可视化平台。尽可能地,我们希望避免使用额外的包并在R中做任何非常“花哨”的事情;课程的重点是统计,而不是编程。然而,对于双因素ANOVA设计,我们还没有找到在R中生成误差条图的非常好的方法。我们使用ggplot2包来制作绘图,虽然它确实有一个生成95%CI错误栏的内置stat_summary方法,但这些计算方式可能并不总是正确的方法。下面,我手动完成ANOVA的代码并手动计算95%CI(从总残差方差估算标准误差,而不仅仅是组内方差ggplot的汇总方法会使用)。最后,实际上是一个情节。
所以问题是......是否有更简单/更快/更简单的方法来完成所有这些?
# LIZARD LENGTH DATA
island.1 <- c(0.2, 5.9, 6.1, 6.5)
island.2 <- c(5.6, 14.8, 15.5, 16.4)
island.3 <- c(0.8, 3.9, 4.3, 4.9)
sex.codes <- c("Male", "Female", "Male", "Female")
# PUTTING DATA TOGETHER IN A DATA FRAME
df.1 <- data.frame(island.1, island.2, island.3, sex.codes)
# MELTING THE DATA FRAME INTO LONG FORM
library(reshape)
df.2 <- melt(df.1)
# MEAN BY CELL
mean.island1.male <- with(df.2, mean(value[variable == "island.1" & sex.codes == "Male"]))
mean.island1.female <- with(df.2, mean(value[variable == "island.1" & sex.codes == "Female"]))
mean.island2.male <- with(df.2, mean(value[variable == "island.2" & sex.codes == "Male"]))
mean.island2.female <- with(df.2, mean(value[variable == "island.2" & sex.codes == "Female"]))
mean.island3.male <- with(df.2, mean(value[variable == "island.3" & sex.codes == "Male"]))
mean.island3.female <- with(df.2, mean(value[variable == "island.3" & sex.codes == "Female"]))
# ADDING CELL MEANS TO DATA FRAME
df.2$means[df.2$variable == "island.1" & df.2$sex.codes == "Male"] <- mean.island1.male
df.2$means[df.2$variable == "island.1" & df.2$sex.codes == "Female"] <- mean.island1.female
df.2$means[df.2$variable == "island.2" & df.2$sex.codes == "Male"] <- mean.island2.male
df.2$means[df.2$variable == "island.2" & df.2$sex.codes == "Female"] <- mean.island2.female
df.2$means[df.2$variable == "island.3" & df.2$sex.codes == "Male"] <- mean.island3.male
df.2$means[df.2$variable == "island.3" & df.2$sex.codes == "Female"] <- mean.island3.female
# LINEAR MODEL
lizard.model <- lm(value ~ variable*sex.codes, data=df.2)
# CALCULATING RESIDUALS BY HAND:
df.2$residuals.1 <- df.2$value - df.2$means
# CONFIRMING RESIDUALS FROM LINEAR MODEL:
df.2$residuals.2 <- residuals(lizard.model)
# TWO FACTOR MAIN EFFECT ANOVA
lizard.anova <- anova(lizard.model)
# INTERACTION PLOT
interaction.plot(df.2$variable, df.2$sex.codes, df.2$value)
# SAMPLE SIZE IN EACH CELL
n <- length(df.2$value[df.2$variable == "island.1" & df.2$sex.codes == "Male"])
# > n
# [1] 2
# NOTE: JUST FOR CLARITY, PRETEND n=10
n <- 10
# CALCULATING STANDARD ERROR
island.se <- sqrt(lizard.anova$M[4]/n)
# HALF CONFIDENCE INTERVAL
island.ci.half <- qt(0.95, lizard.anova$D[4]) * island.se
# MAKING SUMMARY DATA FRAME
summary.df <- data.frame(
Means = c(mean.island1.male,
mean.island1.female,
mean.island2.male,
mean.island2.female,
mean.island3.male,
mean.island3.female),
Location = c("island1",
"island1",
"island2",
"island2",
"island3",
"island3"),
Sex = c("male",
"female",
"male",
"female",
"male",
"female"),
CI.half = rep(island.ci.half, 6)
)
# > summary.df
# Means Location Sex CI.half
# 1 3.15 island1 male 2.165215
# 2 6.20 island1 female 2.165215
# 3 10.55 island2 male 2.165215
# 4 15.60 island2 female 2.165215
# 5 2.55 island3 male 2.165215
# 6 4.40 island3 female 2.165215
# GENERATING THE ERRORBAR PLOT
library(ggplot2)
qplot(data=summary.df,
y=Means,
x=Location,
group=Sex,
ymin=Means-CI.half,
ymax=Means+CI.half,
geom=c("point", "errorbar", "line"),
color=Sex,
shape=Sex,
width=0.25) + theme_bw()
答案 0 :(得分:5)
我不得不承认我对你的代码感到非常困惑。不要把这当作个人批评,但我强烈建议你学习你的学生尽可能多地使用R的力量。他们只能从中受益,而我的经验是,如果我不把线条和代码行弄得乱七八糟,他们会更快地理解正在发生的事情。
首先,您不必手动计算平均值。只是做:
df.2$mean <- with(df.2,ave(value,sex.codes,variable,FUN=mean))
另见?ave
。这比你的例子中的代码杂乱更清楚。如果你有lizard.model,你可以使用
fitted(lizard.model)
并将这些值与平均值进行比较。
然后我强烈反对你。您计算的不是预测的标准误差。要正确执行此操作,请使用predict()
函数
outcome <- predict(lizard.model,se.fit=TRUE)
df.2$CI.half <- outcome$se / 2
要获得预测均值的置信区间,如果您希望学生正确理解,则必须使用正确的公式。看看来自Faraway的这个非常棒的Practical Regression and Anova using R的第3.5节。它包含大量代码示例,其中所有内容都以方便,简洁的方式手动计算。它将为您和您的学生服务。我从中学到了很多东西,并经常将它作为向学生解释这些东西的指南。
现在要获取摘要数据框,您有几个选项,但这个选项有效并且非常容易理解。
summary.df <- unique(df.2[,-c(3,5,6)])
names(summary.df) <- c('Sex','Location','Means','CI.half')
现在你可以运行你的情节代码了。
或者,如果您希望对值进行预测错误,可以使用以下命令:
lizard.predict <- predict(lizard.model,interval='prediction')
df.2$lower <- lizard.predict[,2]
df.2$upper <- lizard.predict[,3]
summary.df <- unique(df.2[,-3])
names(summary.df)[1:3] <- c('Sex','Location','Means')
qplot(data=summary.df,
y=Means,
x=Location,
group=Sex,
ymin=lower,
ymax=upper,
geom=c("point", "errorbar", "line"),
color=Sex,
shape=Sex,
width=0.25) + theme_bw()
PS:如果我在这里和那里听起来很苛刻,那不是故意的。英语不是我的母语,我仍然不熟悉语言的微妙之处。
答案 1 :(得分:4)
[潜在的无耻推广]您应该考虑在HandyStuff包中使用compareCats和rxnNorm函数,可以在www.github.com/bryanhanson/HandyStuff获得警告:我不确定它是否与R 2.14无缝协作。特别是,rxnNorm看起来像你想要制作的情节,而且它为你提供了各种选项,包括总结统计数据和情节装饰。但是,这需要让您的学生安装一个单独的软件包,因此您可能会将其排除在外(但它允许学生专注于呈现和分析数据)。从此处包含的?rxnNorm示例开始绘制。
使用rxnNorm,您可以选择多种计算CI的方法,由参数“method”控制。以下是实际功能(来自ChemoSpec包)。
> seX <- function (x) sd(x, na.rm = TRUE)/sqrt(length(na.omit(x)))
> <environment: namespace:ChemoSpec>
>
> seXy <- function (x) {
> m <- mean(na.omit(x))
> se <- seX(x)
> u <- m + se
> l <- m - se
> c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec>
>
>
> seXy95 <- function (x) {
> m <- mean(na.omit(x))
> se <- seX(x)
> u <- m + 1.96 * se
> l <- m - 1.96 * se
> c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec>
>
>
> seXyIqr <- function (x) {
> i <- fivenum(x)
> c(y = i[3], ymin = i[2], ymax = i[4]) } <environment: namespace:ChemoSpec>
>
> seXyMad <- function (x) {
> m <- median(na.omit(x))
> d <- mad(na.omit(x))
> u <- m + d
> l <- m - d
> c(y = m, ymin = l, ymax = u) } <environment: namespace:ChemoSpec>
答案 2 :(得分:4)
这是使用sciplot包的另一种尝试。计算置信区间的其他方法可以在参数ci.fun中传递。
lineplot.CI(variable,value, group =sex.codes , data = df.2, cex = 1.5,
xlab = "Location", ylab = "means", cex.lab = 1.2, x.leg = 1,
col = c("blue","red"), pch = c(16,16))