澄清回归线之间的比较

时间:2015-09-17 07:20:49

标签: r regression anova

我有这个数据框:

Help_df<- data.frame(
  Variety=c('Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL', 
            'Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL',
            'Sirio CL', 'Sirio CL', 'Sirio CL', 'Sirio CL',
            'Sirio CL', 'Sirio CL', 'Sirio CL', 'Augusto', 
            'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Augusto', 
            'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Augusto',
            'Augusto', 'Augusto', 'Augusto', 'Augusto', 'Mare CL', 
            'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL',
            'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL', 
            'Mare CL', 'Mare CL', 'Mare CL', 'Mare CL'),
  Yield=c(6.98, 6.41, 6.73, 7.15, 7.32, 6.55, 6.92, 7.12, 
          6.77, 6.38, 6.4, 6.1, 5.9, 5.5, 5.6, 6.66, 6.51,
          6.15, 6.03, 6.21, 6.8, 5.98, 6.52, 6.25, 5.56,
          5.9, 5.39, 4.9, 5.6, 5, 4.25, 4.65, 4.89, 4.656,
          5.32, 5.69, 5.89, 6.02, 6.32, 6.54, 6.65, 6.54,
          6.87, 7.2, 6.21 ),
  Index=c(333, 328, 271, 265, 281, 272, 337, 389, 276, 296, 250, 251, 200, 
          200, 190, 317, 371, 351, 313, 367, 338, 356, 
          351, 335, 295, 250, 250, 200, 175, 150, 317,
          371, 351, 313, 289, 265, 298, 145, 278, 295,
          250, 250, 200, 125, 198)
)

我想知道YieldIndex之间的相关性是否因不同的Variety而发生变化。 这里是数据图:

ggplot(Help_df, aes(x=Yield, y=Index, color=Variety)) +
 geom_point(shape=16, size=3) +
 geom_smooth(method=lm,   # Add linear regression lines
              se=FALSE)    # Don't add shaded confidence region

I wander if these regression lines are different

阅读此help我已经测试了这两个Anova

   Anova(lm(Yield~Index*Variety,data=Help_df))
   Anova(lm(formula = Yield ~ Variety + Index + Variety:Index, data = Help_df))

据我所知,术语Index:Variety表示相关性对于不同的Variety具有不同的斜率。 我想知道两个模型之间有什么区别,因为两个Anova输出非常相似,如果有一种&#34;事后测试&#34;这表明哪个Variety与其他Mare不同(在这种情况下显然Yield完全不同,但并不总是那么容易识别哪个因素不同)。

此外,我已经尝试了@PAC提出的解决方案来使用&#34; Chow测试&#34;正如您在上面发布的链接中看到的那样。 这个测试可能是最好的,因为它比较斜率+截距。但p值为1表示IndexVariety之间的相关性与mc <- lm(formula = Index ~ YIELD, data = Help) m1 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Augusto'")) m2 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Sirio CL'")) m3 <- lm(formula = Index ~ YIELD, data = subset(Help, Variety == "'Mare CL'")) sc <- sum(mc$residuals^2) s1 <- sum(m1$residuals^2) s2 <- sum(m2$residuals^2) s3 <- sum(m3$residuals^2) k <- 3 # Test statistic fstat <- (sc - (s1 + s2 + s3)) / k / (s1 + s2 + s3) * (length(mc$residuals) - 2*k) fstat # Rejection region qf(.95,k, length(mc$residuals) - 2*k) # Pvalue pf(fstat,k, length(mc$residuals) - 2*k) 不同,这与我在数据中观察到的不一致。

display: none

1 个答案:

答案 0 :(得分:1)

两种方差分析不仅相似,而且相同。阅读help("formula")

对于成对比较,我会这样做:

m12  <-  lm(formula = Yield ~ Index, 
            data = subset(Help_df, Variety %in% c('Augusto', 'Sirio CL')))
m12v  <- lm(formula = Yield ~ Index * Variety, 
            data = subset(Help_df, Variety %in% c('Augusto', 'Sirio CL')))

m13 <- lm(formula = Yield ~ Index, 
          data = subset(Help_df, Variety %in% c('Augusto', 'Mare CL')))
m13v  <-  lm(formula = Yield ~ Index * Variety, 
             data = subset(Help_df, Variety %in% c('Augusto', 'Mare CL')))

m23  <-  lm(formula = Yield ~ Index, 
            data = subset(Help_df, Variety %in% c('Sirio CL', 'Mare CL')))
m23v  <-  lm(formula = Yield ~ Index * Variety, 
             data = subset(Help_df, Variety %in% c('Sirio CL', 'Mare CL')))

p.adjust(
  c(anova(m12, m12v)$"Pr(>F)"[2],
    anova(m13, m13v)$"Pr(>F)"[2],
    anova(m23, m23v)$"Pr(>F)"[2]),
  method = "holm")
#[1] 1.290727e-04 2.845623e-05 1.340764e-05