对于我的实验,我剪掉了植物并测量了它们的反应,例如在季节结束时产生的叶片质量。我操纵剪裁强度和剪裁时间并且越过这两种处理。我还包括对照修剪处理,产生5种不同的修剪处理组合。每次处理12株植物,共有60株植物,我在两年的时间内跟踪了这些植物。也就是说,我在第1年收集了这60个植物的测量结果,并在第2年收集了相同的植物。
这是我的设计,其中“永不”在时间和“零”强度下任意取代“控制”处理:
Year Timing intensity treatments
2015 early high early-high
2015 early low early-low
2015 late high late-high
2015 late low late-low
2015 never zero control
2014 early high early-high
2014 early low early-low
2014 late high late-high
2014 late low late-low
2014 never zero control
我遵循了Ben Bolker的一条建议,忽略了运行lme4的警告,然后对模型进行了F测试(R- analyzing repeated measures unbalanced design with lme4?):
m1<-lmer(log(plant.leaf.g)~timing*intensity*year+(1|id), data=cmv)
Anova(m1, type="III", test="F")
anova输出给了我时间和强度之间的显着相互作用(p = 0.006),然后我使用以下方法进行多重比较测试:
cmv$SHD<-interaction(cmv$timing, cmv$intensity)
m2<-lmer(log(plant.leaf.g)~-1+SHD+(1|id),data=cmv, na.action=na.exclude)
summary(glht(m2, linfct=mcp(SHD="Tukey")))
这是我的输出的剪辑,其中唯一重要的一对是p = 0.08:
Estimate Std. Error z value Pr(>|z|)
late.2014 - early.2014 == 0 -0.6584 0.3448 -1.910 0.3844
never.2014 - early.2014 == 0 0.1450 0.4102 0.354 0.9992
early.2015 - early.2014 == 0 -0.4906 0.2786 -1.761 0.4788
late.2015 - early.2014 == 0 -0.1687 0.3494 -0.483 0.9965
never.2015 - early.2014 == 0 0.4201 0.4079 1.030 0.9032
never.2014 - late.2014 == 0 0.8034 0.4119 1.951 0.3597
early.2015 - late.2014 == 0 0.1678 0.3419 0.491 0.9963
late.2015 - late.2014 == 0 0.4897 0.2724 1.797 0.4553
never.2015 - late.2014 == 0 1.0785 0.4119 2.618 0.0885 .
early.2015 - never.2014 == 0 -0.6356 0.4074 -1.560 0.6133
为什么Anova认为时间*强度非常重要,但在我的多重比较测试中没有显示出任何重要性?我还有另一种方法可以进行多重比较吗?
在其他多重比较输出中,我得到的p值高达1.00000,这是正常的吗?
data<-structure(list(id = c(91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L,
99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L, 110L,
111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L, 122L,
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L, 133L,
134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L,
146L, 147L, 148L, 149L, 150L, 91L, 92L, 93L, 94L, 95L, 96L, 97L,
98L, 99L, 100L, 101L, 102L, 103L, 105L, 106L, 107L, 108L, 109L,
110L, 111L, 112L, 113L, 114L, 115L, 116L, 117L, 119L, 120L, 121L,
122L, 123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 131L, 132L,
133L, 134L, 135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L,
144L, 146L, 147L, 148L, 149L, 150L), quad = c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), year = c(2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L, 2015L,
2015L, 2015L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L, 2014L,
2014L, 2014L, 2014L, 2014L, 2014L), timing = structure(c(1L,
3L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L,
3L, 2L, 3L, 1L, 3L, 2L, 3L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L,
1L, 3L, 3L, 2L, 2L, 1L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L, 2L, 3L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 3L, 1L, 3L, 2L, 3L,
1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 3L, 2L, 2L, 1L, 2L,
3L, 2L, 1L, 1L, 2L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 3L,
1L), .Label = c("early", "late", "never"), class = "factor"),
intensity = structure(c(2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L,
1L, 2L, 1L, 1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L,
1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L,
1L, 3L, 1L, 1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 1L,
1L, 2L, 3L, 2L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 1L, 1L, 1L, 3L, 1L,
1L, 2L, 2L, 1L, 3L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 3L, 2L
), .Label = c("high", "low", "zero"), class = "factor"),
treatment = structure(c(3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L,
4L, 5L, 2L, 2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L,
2L, 3L, 2L, 4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L,
4L, 1L, 4L, 2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 2L, 3L, 5L,
3L, 2L, 1L, 3L, 1L, 4L, 3L, 2L, 5L, 1L, 4L, 5L, 4L, 5L, 2L,
2L, 5L, 1L, 3L, 2L, 1L, 4L, 1L, 2L, 1L, 4L, 1L, 2L, 3L, 2L,
4L, 3L, 5L, 5L, 3L, 2L, 3L, 1L, 1L, 5L, 4L, 2L, 4L, 1L, 4L,
2L, 3L, 5L, 4L, 1L, 3L, 4L, 5L, 4L, 4L, 3L, 5L, 2L, 1L, 3L
), .Label = c("control", "early-high", "early-low", "late-high",
"late-low"), class = "factor"), plant.leaf.g = c(846.216,
382.704, 2393.088, 61.832, 1315.86, 275.816, 3705.862, 3500.52,
67.482, 432, 487.492, 1228.618, 776.16, 1575, 735.9, 2417.75,
1342.92, 2359.046, 686.726, 1385.856, 343.684, 2277.312,
465.528, 2314.584, 508.4, 1243.644, 1064.448, 1020.646, NA,
494.832, 1318.248, 1516.4, 1271.218, 512.512, 157.878, 3753.992,
586.032, 1042.176, 889.632, 651.052, 498.042, 625.872, 16.28,
497.51, 593.75, 706.84, 2238.742, 232.584, 671.532, 90.72,
1412.442, 902.728, 3077.184, 619.106, 0.576, 400.452, 684.522,
849.852, 152.76, 1280.448, 274.47, 387.614, 98.496, 2304.504,
644.952, 35.392, 250.56, 267.33, 2212.08, 2392.596, 751.944,
629.418, 731.544, 1013.196, 1516.4, 130.536, 2910.6, 554.4,
2163.35, 223.86, 2369.376, 551.976, 985.6, 1482.24, 815.386,
1664.132, 596.376, 1581.432, 217.128, 1041.656, 951.168,
256.172, 1587.148, 359.448, 546.48, 1226.544, 371.64, 293.504,
177.726, 343.26, 691.24, 207.604, 588.924, 1405.258, 136.17,
451.432, 576.18, 424.804, 884.534, 2466.45, 1524.432, 973.208,
369.474, 410.048)), .Names = c("id", "quad", "year", "timing",
"intensity", "treatment", "plant.leaf.g"), class = "data.frame", row.names = c(NA,
-114L))
PS。我不能为我的生活让lsmeans与这种不平衡的设计一起工作。输出中报告了很多NA。
答案 0 :(得分:0)
因为我没有仔细阅读您的问题,所以我没有意识到您在lsmeans
代替treatment
的模型上没有尝试timing*intensity
(由于某种原因,您提供的数据集具有不同的名称,而treatment
代替SHD
)。如果你这样做,它的工作正常:
> m3<-lmer(log(plant.leaf.g) ~ treatment+year+(1|id), data=data)
> library(lsmeans)
> lsmeans(m3, "treatment", type = "response")
Loading required namespace: pbkrtest
treatment response SE df lower.CL upper.CL
control 1017.7290 289.1544 62.29 576.7671 1795.8244
early-high 909.2335 260.3904 68.44 513.4725 1610.0288
early-low 388.1875 116.3790 65.92 213.3433 706.3242
late-high 626.5379 176.6823 56.61 356.1791 1102.1134
late-low 393.3225 125.4142 51.60 207.4053 745.8947
Confidence level used: 0.95
Intervals are back-transformed from the log scale
> pairs(.Last.value, type = "response")
contrast response.ratio SE df t.ratio p.value
control - early-high 1.1193264 0.4457451 72.14 0.283 0.9986
control - early-low 2.6217461 1.0685111 71.26 2.365 0.1371
control - late-high 1.6243693 0.6501965 59.75 1.212 0.7445
control - late-low 2.5875182 1.1050617 56.07 2.226 0.1853
early-high - early-low 2.3422535 0.9581385 74.29 2.081 0.2394
early-high - late-high 1.4512026 0.5762732 68.49 0.938 0.8811
early-high - late-low 2.3116745 0.9907174 58.53 1.955 0.3006
early-low - late-high 0.6195754 0.2549274 61.77 -1.163 0.7719
early-low - late-low 0.9869446 0.4319594 57.88 -0.030 1.0000
late-high - late-low 1.5929371 0.6780835 53.73 1.094 0.8090
P value adjustment: tukey method for comparing a family of 5 estimates
Tests are performed on the log scale
现在,关于你原来的问题。我们看到上面的最小P值约为0.13,而
> library(car)
> Anova(m3)
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: log(plant.leaf.g)
Chisq Df Pr(>Chisq)
treatment 9.5752 4 0.04823
year 0.1147 1 0.73484
因此treatment
的ANOVA检验的P值约为0.05。 ANOVA $ F $测试几乎没有意义,相当于根据Scheffe临界值,treatment
级别之间的某些对比几乎不显着。如果这种对比恰好是成对比较,或几乎是一对,那么这种成对比较将是重要的。但这不是这种情况。第一个和第二个手段比第三个和第五个手段高很多,这导致我得到这个结果,这个结果非常重要:
> contrast(lsmeans(m3, "treatment"), list(my.con = c(1, 1, -1, 0, -1)))
contrast estimate SE df t.ratio p.value
my.con 1.801813 0.5910685 64.56 3.048 0.0033
Results are given on the log (not the response) scale.
Tests are performed on the log scale
另一个有点显着的对比是高强度和低强度的比较:
> contrast(lsmeans(m3, "treatment"), list(hi.lo = c(0, 1, -1, 1, -1)/2))
contrast estimate SE df t.ratio p.value
hi.lo 0.6583465 0.2967584 60.45 2.218 0.0303
请记住,ANOVA测试并不能保证关于配对比较的大部分内容。
答案 1 :(得分:0)
另一个镜头。 OP知道这一点,但为了让其他人清楚,这里看看这些因素是如何相关的:
R> with(data, table(timing, intensity, year))
, , year = 2014
intensity
timing high low zero
early 11 11 0
late 13 10 0
never 0 0 12
, , year = 2015
intensity
timing high low zero
early 12 11 0
late 12 10 0
never 0 0 12
请注意,timing = "never"
intensity = "zero"
是一种特殊控制条件,只有这些因素的其他级别组合使用。这就是将模型视为单独因素的原因导致解释上的困难。因子treatment
包含在数据集中,具有实际发生的5种组合的等级。
看看一些模型(我看了残差图,我认为平方根变换最好):
R> library("lme4")
R> m3 = lmer(sqrt(plant.leaf.g) ~ treatment + year + (1|id), data=data)
R> m4 = lmer(sqrt(plant.leaf.g) ~ treatment * year + (1|id), data=data)
Warning message:
Some predictor variables are on very different scales: consider rescaling
比较这两个模型:
R> anova(m3, m4)
refitting model(s) with ML (instead of REML)
Data: data
Models:
m3: sqrt(plant.leaf.g) ~ treatment + year + (1 | id)
m4: sqrt(plant.leaf.g) ~ treatment * year + (1 | id)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m3 8 876.92 898.74 -430.46 860.92
m4 12 872.14 904.87 -424.07 848.14 12.783 4 0.01238
尽管有警告信息,但似乎还是应该与year
进行互动。
要解释此模型,ANOVA表的用途有限。查看模型预测的内容并进行适当的比较,可以提供更多信息。这些预测称为“最小二乘平均值”。我们将每年单独计算它们(因为与year
的交互):
R> library("lsmeans")
R> (m4.lsm = lsmeans(m4, ~ treatment | year, at = list(year = c(2014, 2015)), type = "response"))
year = 2014:
treatment response SE df lower.CL upper.CL
control 1082.1190 224.6040 91.46 681.9805 1574.2165
early-high 1149.8090 236.6617 97.92 728.1153 1667.4202
early-low 647.5407 180.8791 92.57 338.1453 1056.5696
late-high 490.0813 148.4696 82.72 239.2544 829.8841
late-low 485.0953 171.6529 73.61 203.3380 887.4499
year = 2015:
treatment response SE df lower.CL upper.CL
control 1393.5241 254.9350 91.35 933.1535 1945.8958
early-high 831.3529 192.9245 97.47 492.5582 1258.3147
early-low 746.0977 199.8967 96.30 402.0731 1195.6255
late-high 1050.4552 225.8614 83.42 649.2805 1547.6725
late-low 520.3968 177.7890 73.61 226.4119 934.9789
Confidence level used: 0.95
Intervals are back-transformed from the sqrt scale
最后我们可以对它们进行比较。可以进行所有成对比较,但也许对我来说更有用的信息是构建对比度,其中包括可解释的4 d.f.的细分。 treatment
:
R> trt.con = data.frame(
+ timing = c(0, 1, 1, -1, -1)/2,
+ intensity = c(0, 1, -1, 1, -1)/2,
+ tim.int = c(0, 1, -1, -1, 1)/2,
+ ctl.vs.trt = c(4, -1, -1, -1, -1)/4,
+ row.names = levels(data$treatment))
R> trt.con
timing intensity tim.int ctl.vs.trt
control 0.0 0.0 0.0 1.00
early-high 0.5 0.5 0.5 -0.25
early-low 0.5 -0.5 -0.5 -0.25
late-high -0.5 0.5 -0.5 -0.25
late-low -0.5 -0.5 0.5 -0.25
这些包括要应用于treatment
最小二乘平均值的系数。例如,用于定时的那个将两个早期定时的平均值与两个晚期定时的平均值进行比较。第三列是交互,第四列将控制条件与其他四个的平均值进行比较。现在我们可以测试这些对比:
R> contrast(m4.lsm, trt.con)
year = 2014:
contrast estimate SE df t.ratio p.value
timing 7.5964984 3.586509 86.29 2.118 0.0370
intensity 4.2874559 3.569964 87.92 1.201 0.2330
tim.int 4.1745568 3.508331 94.05 1.190 0.2371
ctl.vs.trt 7.0159980 3.800634 95.97 1.846 0.0680
year = 2015:
contrast estimate SE df t.ratio p.value
timing 0.4625233 3.606278 87.74 0.128 0.8982
intensity 5.5584603 3.596573 88.42 1.545 0.1258
tim.int -4.0400583 3.529456 94.91 -1.145 0.2552
ctl.vs.trt 9.4872065 3.810418 95.75 2.490 0.0145
一个有趣的结果是,有一些证据表明两年内控制条件与治疗方法不同,timing
对比仅在2014年才有显着性。
(我意识到这已经成为CrossValidated风格的答案,而不是StackExchange的答案;但最终,做一些有意义的事情胜过仅仅让程序运行。)