Question

我有一个问题，我现在已经尝试解决了几个小时，但我根本无法弄清楚（我是R btw的新手。）。

基本上，我要做的事情（使用mtcars进行说明）是使R测试同一自变量（“ mpg”）的不同自变量（同时调整“ cyl”和“ disp”）。我能想到的最好的解决方案是：

lm <- lapply(mtcars[,4:6], function(x) lm(mpg ~ cyl + disp + x, data = mtcars))
summary <- lapply(lm, summary)

...其中4：6对应于“ hp”，“ drat”和“ wt”列。

这在正常情况下可以正常工作，但问题是摘要显示为“ x”，而不是实例“ hp”：

$hp

Call:
lm(formula = mpg ~ cyl + disp + x, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.0889 -2.0845 -0.7745  1.3972  6.9183 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
cyl         -1.22742    0.79728  -1.540   0.1349    
disp        -0.01884    0.01040  -1.811   0.0809 .  
x           -0.01468    0.01465  -1.002   0.3250    
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.055 on 28 degrees of freedom
Multiple R-squared:  0.7679,    Adjusted R-squared:  0.743 
F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

问题：

是否可以解决此问题？而且我是否使用lapply以最聪明的方式完成了此操作，还是将它用于循环或其他选项会更好？

理想情况下，我也非常希望创建一个表格，例如仅显示每个因变量的估计值和P值。可以这样做吗？

最诚挚的问候

Answer 1

获取摘要中显示的变量名称的一种方法是遍历变量名称并使用paste和as.formula设置公式：

lm <- lapply(names(mtcars)[4:6], function(x) { 
  formula <- as.formula(paste0("mpg ~ cyl + disp + ", x))
  lm(formula, data = mtcars)
})
summary <- lapply(lm, summary)
summary
#> [[1]]
#> 
#> Call:
#> lm(formula = formula, data = mtcars)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.0889 -2.0845 -0.7745  1.3972  6.9183 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
#> cyl         -1.22742    0.79728  -1.540   0.1349    
#> disp        -0.01884    0.01040  -1.811   0.0809 .  
#> hp          -0.01468    0.01465  -1.002   0.3250    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.055 on 28 degrees of freedom
#> Multiple R-squared:  0.7679, Adjusted R-squared:  0.743 
#> F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

关于问题的第二部分。通过使用broom::tidy包中的broom来实现此目的的一种方法，它为您提供了一个整齐的数据帧作为回归结果的摘要：

lapply(lm, broom::tidy)
#> [[1]]
#> # A tibble: 4 x 5
#>   term        estimate std.error statistic  p.value
#>   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#> 1 (Intercept)  34.2       2.59       13.2  1.54e-13
#> 2 cyl          -1.23      0.797      -1.54 1.35e- 1
#> 3 disp         -0.0188    0.0104     -1.81 8.09e- 2
#> 4 hp           -0.0147    0.0147     -1.00 3.25e- 1

Answer 2

我们可以使用reformulate为lm创建公式

lst1 <- lapply(names(mtcars)[4:6], function(x) {
    fmla <- reformulate(c("cyl", "disp", x), 
       response = "mpg")
    model <- lm(fmla, data = mtcars)
     model$call <- deparse(fmla)
     model
       })

然后，获取summary

summary1 <- lapply(lst1, summary)
summary1[[1]]

#Call:
#"mpg ~ cyl + disp + hp"

#Residuals:
#    Min      1Q  Median      3Q     Max 
#-4.0889 -2.0845 -0.7745  1.3972  6.9183 

#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept) 34.18492    2.59078  13.195 1.54e-13 ***
#cyl         -1.22742    0.79728  -1.540   0.1349    
#disp        -0.01884    0.01040  -1.811   0.0809 .  
#hp          -0.01468    0.01465  -1.002   0.3250    
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#Residual standard error: 3.055 on 28 degrees of freedom
#Multiple R-squared:  0.7679,   Adjusted R-squared:  0.743 
#F-statistic: 30.88 on 3 and 28 DF,  p-value: 5.054e-09

来自汇总值的不同自变量和表

2 个答案: