我有这个数据框:
Help_df<- data.frame(
Variety=c('Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL',
'Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL',
'Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL',
'Sirio CL', 'Sirio CL', 'Sirio CL', 'Augusto',
'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Augusto',
'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Augusto',
'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Mare CL',
'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL',
'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL',
'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL'),
Yield=c(6.98, 6.41, 6.73, 7.15, 7.32, 6.55, 6.92, 7.12,
6.77, 6.38, 6.4, 6.1, 5.9, 5.5, 5.6, 6.66, 6.51,
6.15, 6.03, 6.21, 6.8, 5.98, 6.52, 6.25, 5.56,
5.9, 5.39, 4.9, 5.6, 5, 4.25, 4.65, 4.89, 4.656,
5.32, 5.69, 5.89, 6.02, 6.32, 6.54, 6.65, 6.54,
6.87, 7.2, 6.21 ),
Index=c(333, 328, 271, 265, 281, 272, 337, 389, 276, 296, 250, 251, 200,
200, 190, 317, 371, 351, 313, 367, 338, 356,
351, 335, 295, 250, 250, 200, 175, 150, 317,
371, 351, 313, 289, 265, 298, 145, 278, 295,
250, 250, 200, 125, 198)
)
我想知道Yield
和Index
之间的相关性是否因不同的Variety
而发生变化。
这里是数据图:
ggplot(Help_df, aes(x=Yield, y=Index, color=Variety)) +
geom_point(shape=16, size=3) +
geom_smooth(method=lm, # Add linear regression lines
se=FALSE) # Don't add shaded confidence region
阅读此help我已经测试了这两个Anova
:
Anova(lm(Yield~Index*Variety,data=Help_df))
Anova(lm(formula = Yield ~ Variety + Index + Variety:Index, data = Help_df))
据我所知,术语Index:Variety
表示相关性对于不同的Variety
具有不同的斜率。
我想知道两个模型之间有什么区别,因为两个Anova
输出非常相似,如果有一种&#34;事后测试&#34;这表明哪个Variety
与其他Mare
不同(在这种情况下显然Yield
完全不同,但并不总是那么容易识别哪个因素不同)。
此外,我已经尝试了@PAC提出的解决方案来使用&#34; Chow测试&#34;正如您在上面发布的链接中看到的那样。
这个测试可能是最好的,因为它比较斜率+截距。但p值为1表示Index
和Variety
之间的相关性与mc <- lm(formula = Index ~ YIELD, data = Help)
m1 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Augusto'"))
m2 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Sirio CL'"))
m3 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Mare CL'"))
sc <- sum(mc$residuals^2)
s1 <- sum(m1$residuals^2)
s2 <- sum(m2$residuals^2)
s3 <- sum(m3$residuals^2)
k <- 3
# Test statistic
fstat <- (sc - (s1 + s2 + s3)) / k / (s1 + s2 + s3) * (length(mc$residuals) - 2*k)
fstat
# Rejection region
qf(.95,k, length(mc$residuals) - 2*k)
# Pvalue
pf(fstat,k, length(mc$residuals) - 2*k)
不同,这与我在数据中观察到的不一致。
display: none
答案 0 :(得分:1)
两种方差分析不仅相似,而且相同。阅读help("formula")
。
对于成对比较,我会这样做:
m12 <- lm(formula = Yield ~ Index,
data = subset(Help_df, Variety %in% c('Augusto', 'Sirio CL')))
m12v <- lm(formula = Yield ~ Index * Variety,
data = subset(Help_df, Variety %in% c('Augusto', 'Sirio CL')))
m13 <- lm(formula = Yield ~ Index,
data = subset(Help_df, Variety %in% c('Augusto', 'Mare CL')))
m13v <- lm(formula = Yield ~ Index * Variety,
data = subset(Help_df, Variety %in% c('Augusto', 'Mare CL')))
m23 <- lm(formula = Yield ~ Index,
data = subset(Help_df, Variety %in% c('Sirio CL', 'Mare CL')))
m23v <- lm(formula = Yield ~ Index * Variety,
data = subset(Help_df, Variety %in% c('Sirio CL', 'Mare CL')))
p.adjust(
c(anova(m12, m12v)$"Pr(>F)"[2],
anova(m13, m13v)$"Pr(>F)"[2],
anova(m23, m23v)$"Pr(>F)"[2]),
method = "holm")
#[1] 1.290727e-04 2.845623e-05 1.340764e-05