在R线性模型中,仅获得相互作用系数的p值

时间:2012-06-27 15:26:36

标签: r lm

如果我在R中有一个线性模型的汇总表,我怎样才能得到仅与交互估计相关联的p值,或者只是组拦截等,而不必计算行数?

例如,对于lm(y ~ x + group)x为连续且group为分类的模型,lm对象的摘要表的估算值为:

  1. 拦截
  2. x,所有群体的斜率
  3. 5与整体拦截的组内差异
  4. 5与整体坡度的差异。
  5. 我想找出一种方法将每个这些作为一组p值,即使组的数量或模型公式发生变化。也许汇总表以某种方式用于将行组合在一起的信息?

    以下是具有两个不同模型的示例数据集。第一个模型有四组不同的p值我可能想单独得到,而第二个模型只有两组p值。

    x <- 1:100
    groupA <- .5*x + 10 + rnorm(length(x), 0, 1)
    groupB <- .5*x + 20 + rnorm(length(x), 0, 1)
    groupC <- .5*x + 30 + rnorm(length(x), 0, 1)
    groupD <- .5*x + 40 + rnorm(length(x), 0, 1)
    groupE <- .5*x + 50 + rnorm(length(x), 0, 1)
    groupF <- .5*x + 60 + rnorm(length(x), 0, 1)
    
    myData <- data.frame(x = x,
        y = c(groupA, groupB, groupC, groupD, groupE, groupF),
        group = rep(c("A","B","C","D","E","F"), each = length(x))
    )
    
    myMod1 <- lm(y ~ x + group + x:group, data = myData)
    myMod2 <- lm(y ~ group + x:group - 1, data = myData)
    summary(myMod1)
    summary(myMod2)
    

2 个答案:

答案 0 :(得分:15)

您可以通过summary()$coefficients访问所有系数及其相关统计信息,如下所示:

> summary(myMod1)$coefficients
                 Estimate  Std. Error      t value      Pr(>|t|)
(Intercept)  9.8598180335 0.207551769  47.50534335 1.882690e-203
x            0.5013049448 0.003568152 140.49427911  0.000000e+00
groupB       9.9833257879 0.293522526  34.01212819 5.343527e-141
groupC      20.0988336744 0.293522526  68.47458673 2.308586e-282
groupD      30.0671851583 0.293522526 102.43569906  0.000000e+00
groupE      39.8366758058 0.293522526 135.71931370  0.000000e+00
groupF      50.4780382104 0.293522526 171.97330259  0.000000e+00
x:groupB    -0.0001115097 0.005046129  -0.02209807  9.823772e-01
x:groupC     0.0004144536 0.005046129   0.08213297  9.345689e-01
x:groupD     0.0022577223 0.005046129   0.44741668  6.547390e-01
x:groupE     0.0024544207 0.005046129   0.48639675  6.268671e-01
x:groupF    -0.0052089956 0.005046129  -1.03227556  3.023674e-01

其中你只想要p值,即第4列:

> summary(myMod1)$coefficients[,4]
  (Intercept)             x        groupB        groupC        groupD        groupE        groupF      x:groupB      x:groupC 
1.882690e-203  0.000000e+00 5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00  9.823772e-01  9.345689e-01 
     x:groupD      x:groupE      x:groupF 
 6.547390e-01  6.268671e-01  3.023674e-01 

最后,您只需要特定系数的p值,无论是截距还是交互项。一种方法是通过names(summary(myMod1)$coefficients[,4])将系数名称(grepl())与RegEx匹配,并使用grepl作为索引返回的逻辑向量:

> # all group dummies
> summary(myMod1)$coefficients[grepl('^group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
       groupB        groupC        groupD        groupE        groupF 
5.343527e-141 2.308586e-282  0.000000e+00  0.000000e+00  0.000000e+00 
> # all interaction terms
> summary(myMod1)$coefficients[grepl('^x:group[A-F]',names(summary(myMod1)$coefficients[,4])),4]
 x:groupB  x:groupC  x:groupD  x:groupE  x:groupF 
0.9823772 0.9345689 0.6547390 0.6268671 0.3023674 

答案 1 :(得分:4)

现在有broom包来处理统计函数的输出。在这种情况下,使用tidy()函数:

library(broom)
tidy(myMod1)

          term      estimate  std.error   statistic       p.value
1  (Intercept) 10.0379389850 0.19497112  51.4842342 5.143448e-220
2            x  0.5009946732 0.00335187 149.4672019  0.000000e+00
3       groupB  9.8949134549 0.27573081  35.8861368 3.002513e-150
4       groupC 19.8437942091 0.27573081  71.9679981 1.021613e-293
5       groupD 29.9055587100 0.27573081 108.4592579  0.000000e+00
6       groupE 39.7258414666 0.27573081 144.0747296  0.000000e+00
7       groupF 50.1210013973 0.27573081 181.7751231  0.000000e+00
8     x:groupB -0.0005319302 0.00474026  -0.1122154  9.106909e-01
9     x:groupC -0.0010145553 0.00474026  -0.2140294  8.305983e-01
10    x:groupD -0.0025544113 0.00474026  -0.5388757  5.901766e-01
11    x:groupE  0.0045780202 0.00474026   0.9657740  3.345543e-01
12    x:groupF -0.0058636354 0.00474026  -1.2369859  2.165861e-01

结果是data.frame,因此您可以轻松过滤交互术语(名称中包含冒号):

pvals <- tidy(myMod1)[, c(1,5)]
pvals[grepl(":", pvals$term), ]

       term   p.value
8  x:groupB 0.9106909
9  x:groupC 0.8305983
10 x:groupD 0.5901766
11 x:groupE 0.3345543
12 x:groupF 0.2165861

broom适用于dplyr套餐;例如,提取非交互组系数:

library(dplyr)
tidy(myMod1) %>%
  select(term, p.value) %>%
  filter(! grepl(":", term), term != "(Intercept)", term != "x")

    term       p.value
1 groupB 3.002513e-150
2 groupC 1.021613e-293
3 groupD  0.000000e+00
4 groupE  0.000000e+00
5 groupF  0.000000e+00