Question

我在数据框 dune（下面 - 页面底部）中有三列描述了为三种不同的沙丘生态系统记录的marram草的覆盖百分比：

（1）恢复; （2）退化;和（3）自然;

我已经进行了两种不同的One Way Anova测试（下面） - 测试1和测试2 - 以建立生态系统之间的显着差异。测试1明确显示了生态系统之间的显着差异;但是，测试2显示没有显着差异。箱形图（下图）显示了生态系统之间差异的明显差异。

之后，我将数据帧融合以产生一个因子列（即，Ecosystem.Type），它也是响应变量。这个想法是应用一个glm模型（测试3 - 下面）用One Way Anova进行测试;但是，此方法不成功（请在下面找到错误消息）。

问题

我很困惑我的代码执行每个One Way Anova测试是否正确以及执行事后测试（土耳其HSD，Scheffe或其他）的正确程序来区分显着不同的生态系统对。如果有人可以提供帮助，我将非常感谢您的建议。非常感谢....

data(dune)

测试1

dune.type.1<-aov(Natural~Restored+Degraded, data=dune)
summary.aov(dune.type.1, intercept=T)

               Df Sum Sq Mean Sq F value   Pr(>F)    
     (Intercept)  1  34694   34694 138.679 1.34e-09 ***
     Restored     1     94      94   0.375    0.548    
     Degraded     1    486     486   1.942    0.181    
     Residuals   17   4253     250                     
           ---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

事后测试

    posthoc<-TukeyHSD(dune.type.1, conf.level=0.95)

    Error in TukeyHSD.aov(dune.type.1, conf.level = 0.95) : 

    no factors in the fitted model

    In addition: Warning messages:
    1: In replications(paste("~", xx), data = mf) :
       non-factors ignored: Restored
    2: In replications(paste("~", xx), data = mf) :
       non-factors ignored: Degraded

测试2

     dune1<-aov(Restored~Natural, data=dune)
     dune2<-aov(Restored~Degraded, data=dune)
     dune3<-aov(Degraded~Natural, data=dune)

     summary(dune1)

                 Df Sum Sq Mean Sq F value Pr(>F)
     Natural      1     86   85.58   0.356  0.558
     Residuals   18   4325  240.26               

    summary(dune2)

                 Df Sum Sq Mean Sq F value Pr(>F)
     Degraded     1    160   159.7   0.676  0.422
     Residuals   18   4250   236.1               

     summary(dune3)

                 Df Sum Sq Mean Sq F value Pr(>F)
     Natural      1  168.5  168.49   2.318  0.145
     Residuals   18 1308.5   72.69

测试3

melt.dune<-melt(dune, measure.vars=c("Degraded", "Restored", "Natural"))


colnames(melt.dune)=c("Ecosystem.Type", "Percentage.cover")
melt.dune$Percentage.cover<-as.numeric(melt.dune$Percentage.cover)

glm.dune<-glm(Ecosystem.Type~Percentage.cover, data=melt.dune)
summary(glm.dune)

Error

glm.dune<-glm(Ecosystem.Type~Percentage.cover, data=melt.dune)
Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  : 
NA/NaN/Inf in 'y'
In addition: Warning messages:
1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors
3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors

融合数据框

structure(list(Ecosystem.Type = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Degraded", "Restored", 
"Natural"), class = "factor"), Percentage.cover = c(12, 17, 21, 
11, 22, 16, 7, 9, 14, 2, 3, 15, 23, 4, 19, 36, 26, 4, 15, 23, 
38, 46, 65, 35, 54, 29, 48, 13, 19, 33, 37, 55, 11, 53, 13, 24, 
28, 44, 42, 39, 18, 61, 31, 46, 51, 51, 41, 44, 55, 47, 73, 43, 
25, 42, 21, 13, 65, 30, 47, 29)), row.names = c(NA, -60L), .Names =         c("Ecosystem.Type", 
 "Percentage.cover"), class = "data.frame")

数据

 structure(list(Degraded = c(12L, 17L, 21L, 11L, 22L, 16L, 7L, 
 9L, 14L, 2L, 3L, 15L, 23L, 4L, 19L, 36L, 26L, 4L, 15L, 23L), 
 Restored = c(38L, 46L, 65L, 35L, 54L, 29L, 48L, 13L, 19L, 
 33L, 37L, 55L, 11L, 53L, 13L, 24L, 28L, 44L, 42L, 39L), Natural = c(18L, 
 61L, 31L, 46L, 51L, 51L, 41L, 44L, 55L, 47L, 73L, 43L, 25L, 
 42L, 21L, 13L, 65L, 30L, 47L, 29L)), .Names = c("Degraded", 
 "Restored", "Natural"), class = "data.frame", row.names = c(NA, 
 -20L))

Answer 1

有几件事我想指点你。

首先，测试1和测试2产生类似的结果。唯一的区别是你在测试1上选择了一个截距，因此结果告诉你，如果你适合一个线性模型（我将在几分钟内完成），需要拦截。因此，你看到的重要性在于你强行适合的线是否需要拦截。尝试对其他结果使用“intercept = T”，我很确定你会得到类似的结果。

您应该注意的第二件事是关于您尝试适合的线性模型。 dune.type.1模型是一个模型，您可以实际看到不同沙丘生态系统的相关性。换句话说，你假设自然和恢复之间存在线性关联，并且随着恢复的每个单位增加，你对自然有一些增加。如果我理解你想要的是检查平均差异而不是它们的相关性。因此，你可以做两件事：

准备数据以执行t.tests（比较几个类别之间的平均值的测试）。由于所有变量都相当正常，所以在R中很容易做到并且有效。但是，您将遇到多个测试问题（您可能会执行3次t检验以获得所有均值差异），因此需要使用Bonferroni校正。
但我认为您真正想要的是以下内容：

首先改革数据

       data <- data.frame(v = c(dune$Degraded, dune$Restored, dune$Natural), 
                   labels = c(rep("Degraded", times = 20), rep("Restored", times = 20), 
                              rep("Natural", times = 20)))

然后拟合线性模型

    mod.1 <- lm(v ~ labels, data = data)
    summary(mod.1)
    lm(formula = v ~ labels, data = data)

Residuals:
Min      1Q  Median      3Q     Max 
-28.650 -10.725   0.875   8.050  31.350 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      14.950      3.066   4.875 9.07e-06 ***
labelsNatural    26.700      4.337   6.157 7.95e-08 ***
labelsRestored   21.350      4.337   4.923 7.64e-06 ***

您可以实际看到基线类别（即降级）的平均值与自然类别等的平均值相比明显更小。您还可以检查模型假设，看看您的模型是否合适

    qqnorm(residuals(mod.1))
    qqline(residuals(mod.1))

它们的残差是合理正常的，所以模型很好。您也可以按照您的anova方法进行操作：

    anova.model <- aov(v ~ labels, data = data))
    summary(anova.model)

             Df Sum Sq Mean Sq F value   Pr(>F)    
 labels       2   7982    3991   21.22 1.29e-07 ***
 Residuals   57  10720     188

表明沙丘生态系统的平均值之间至少存在一个显着差异，并且逐点跟踪Tukey：

    post <- TukeyHSD(aov(v ~ labels, data = data))
    plot(post, ylim = c(0, 4))

已经针对多次测试进行了调整:)

R：单向Anova和成对事后测试（土耳其，Scheffe或其他）用于数值柱

1 个答案: