非常简单的anova模型 - 将变量与因子和不同的向量进行比较

时间:2012-11-11 08:55:17

标签: r anova

我认为我的问题最好用一些代码来理解:

#Load data
b <- structure(list(s1 = c(18.5, 24, 17.2, 19.9, 18), s2 = c(26.3, 
     25.3, 24, 21.2, 24.5), s3 = c(20.6, 25.2, 20.8, 24.7, 22.9), 
     s4 = c(25.5, 19.9, 22.6, 17.5, 20.4)), .Names = c("s1", "s2", 
     "s3", "s4"), row.names = c(NA, -5L), class = "data.frame")

# Model A
# One way (the wrong way) to test wether s1,s2,s3,s4 differs:
summary(aov(s1~s2+s3+s4, data=b))
# R does not complain here - but I don't know what I am doing. I guess I am trying
# to explain the variance in s1, with the variable s2,s3 and s4.
# I am not sure how this actually is different from a proper anova (see below).
# Also I dont understand why the Sum of Squares for s3 is much larger than the sum of    
# squares for s2 and s4.

# Model B
# The correct way to do it (requires reshape)
# install.packages('reshape')
# library(reshape)

summary(aov(value ~variable, data=melt(b)))
# This is correct - I am here testing variation within the factors of 'variable',
# to explain variation in 'value'.
# Doing 
TukeyHSD(aov(value ~variable, data=melt(b)))
# shows me that s1 is significantly different from s2.
# My way of thinking is that this result should be evident from "model A"
# What does Sum of Squares in model A mean? - why is it so big for s3?

从上面代码中的注释开始:我要求解释模型A错误的原因和原因。

1 个答案:

答案 0 :(得分:2)

模型A不是ANOVA。您正在使用s2,s3和s4作为预测变量建模一个响应变量(s1);这是对协方差的分析。如果绘制相关矩阵,那么s3如此之大的原因将变得明显; cor( b )会告诉你

            s1          s2         s3         s4
s1  1.00000000  0.08009315  0.7929146 -0.4200527
s2  0.08009315  1.00000000 -0.4433499  0.7846037
s3  0.79291464 -0.44334985  1.0000000 -0.8725241
s4 -0.42005268  0.78460371 -0.8725241  1.0000000

你无法将它与模型B进行比较,模型B将s1视为预测变量而不是响应变量,而你的响应变量是类(s1,s2,s3或s4)。