如果我在R中有一个线性模型的汇总表,我怎样才能得到仅与交互估计相关联的p值,或者只是组拦截等,而不必计算行数?
例如,对于lm(y ~ x + group)
,x
为连续且group
为分类的模型,lm
对象的摘要表的估算值为:
我想找出一种方法将每个这些作为一组p值,即使组的数量或模型公式发生变化。也许汇总表以某种方式用于将行组合在一起的信息?
以下是具有两个不同模型的示例数据集。第一个模型有四组不同的p值我可能想单独得到,而第二个模型只有两组p值。
x <- 1:100
groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
groupF <- .5*x + 60 + rnorm(length(x), 0, 1)
myData <- data.frame(x = x,
y = c(groupA, groupB, groupC, groupD, groupE, groupF),
group = rep(c("A","B","C","D","E","F"), each = length(x))
)
myMod1 <- lm(y ~ x + group + x:group, data = myData)
myMod2 <- lm(y ~ group + x:group - 1, data = myData)
summary(myMod1)
summary(myMod2)
答案 0 :(得分:15)
您可以通过summary()$coefficients
访问所有系数及其相关统计信息,如下所示:
> summary(myMod1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.8598180335 0.207551769 47.50534335 1.882690e-203
x 0.5013049448 0.003568152 140.49427911 0.000000e+00
groupB 9.9833257879 0.293522526 34.01212819 5.343527e-141
groupC 20.0988336744 0.293522526 68.47458673 2.308586e-282
groupD 30.0671851583 0.293522526 102.43569906 0.000000e+00
groupE 39.8366758058 0.293522526 135.71931370 0.000000e+00
groupF 50.4780382104 0.293522526 171.97330259 0.000000e+00
x:groupB -0.0001115097 0.005046129 -0.02209807 9.823772e-01
x:groupC 0.0004144536 0.005046129 0.08213297 9.345689e-01
x:groupD 0.0022577223 0.005046129 0.44741668 6.547390e-01
x:groupE 0.0024544207 0.005046129 0.48639675 6.268671e-01
x:groupF -0.0052089956 0.005046129 -1.03227556 3.023674e-01
其中你只想要p值,即第4列:
> summary(myMod1)$coefficients[,4]
(Intercept) x groupB groupC groupD groupE groupF x:groupB x:groupC
1.882690e-203 0.000000e+00 5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00 9.823772e-01 9.345689e-01
x:groupD x:groupE x:groupF
6.547390e-01 6.268671e-01 3.023674e-01
最后,您只需要特定系数的p值,无论是截距还是交互项。一种方法是通过names(summary(myMod1)$coefficients[,4])
将系数名称(grepl()
)与RegEx匹配,并使用grepl
作为索引返回的逻辑向量:
> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
groupB groupC groupD groupE groupF
5.343527e-141 2.308586e-282 0.000000e+00 0.000000e+00 0.000000e+00
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
x:groupB x:groupC x:groupD x:groupE x:groupF
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674
答案 1 :(得分:4)
现在有broom包来处理统计函数的输出。在这种情况下,使用tidy()
函数:
library(broom)
tidy(myMod1)
term estimate std.error statistic p.value
1 (Intercept) 10.0379389850 0.19497112 51.4842342 5.143448e-220
2 x 0.5009946732 0.00335187 149.4672019 0.000000e+00
3 groupB 9.8949134549 0.27573081 35.8861368 3.002513e-150
4 groupC 19.8437942091 0.27573081 71.9679981 1.021613e-293
5 groupD 29.9055587100 0.27573081 108.4592579 0.000000e+00
6 groupE 39.7258414666 0.27573081 144.0747296 0.000000e+00
7 groupF 50.1210013973 0.27573081 181.7751231 0.000000e+00
8 x:groupB -0.0005319302 0.00474026 -0.1122154 9.106909e-01
9 x:groupC -0.0010145553 0.00474026 -0.2140294 8.305983e-01
10 x:groupD -0.0025544113 0.00474026 -0.5388757 5.901766e-01
11 x:groupE 0.0045780202 0.00474026 0.9657740 3.345543e-01
12 x:groupF -0.0058636354 0.00474026 -1.2369859 2.165861e-01
结果是data.frame,因此您可以轻松过滤交互术语(名称中包含冒号):
pvals <- tidy(myMod1)[, c(1,5)]
pvals[grepl(":", pvals$term), ]
term p.value
8 x:groupB 0.9106909
9 x:groupC 0.8305983
10 x:groupD 0.5901766
11 x:groupE 0.3345543
12 x:groupF 0.2165861
broom
适用于dplyr套餐;例如,提取非交互组系数:
library(dplyr)
tidy(myMod1) %>%
select(term, p.value) %>%
filter(! grepl(":", term), term != "(Intercept)", term != "x")
term p.value
1 groupB 3.002513e-150
2 groupC 1.021613e-293
3 groupD 0.000000e+00
4 groupE 0.000000e+00
5 groupF 0.000000e+00