Question

我目前正在对一系列因变量（将近200个）进行多次线性回归分析，并希望创建一个函数来对一组指定的列运行该函数，然后提取相关模型估算值，例如Beta系数和p值。

模拟数据：

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005,    1006,   1007,   1008,   1009,   1010,   1011),
                    age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
                    sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
                    testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
                    testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
                    testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
                    education =  as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))

看起来像：

    ID  age  sex   testscore_1 testscore_2  testscore_3  education
1  1001  56   0          23           1          18         5
2  1002  43   1          28           3          20         4
3  1003  59   0          30           2          19         3
4  1004  74   0          15           5          15         5
5  1005  61   1           7           8          20         2
6  1006  62   1          18           2          23         1
7  1007  69   0          29           5          19         4
8  1008  80   1          27           6          25         4
9  1009  40   0          14           7          10         3
10 1010  55   1          22           8          14         5
11 1011  58   0          24           2          12         2

我处于一个可以正常工作的阶段：

lm_results <- lapply(df[,4:6], function(x) lm(x ~ age + sex + education, 
       data = df))

我可以从中得出系数估计值：

Coefficient <- data.frame(coefficients = sapply(lm_results, getElement, name = "coefficients"))

尽管我无法从中得出p值，但哪个返回每个testcore_ *变量中每个预测变量的系数这些模型：

P_values <- data.frame(p.values = sapply(lm_results, getElement, name = "qr"))

有人对此有任何建议吗？

Answer 1

实际上，可以使用cbind和broom::tidy

lm_results <- lm(cbind(testscore_1, testscore_2, testscore_3) ~ age + sex + education, data = df)
broom::tidy(lm_results)
# A tibble: 12 x 6
#   response    term         estimate std.error statistic p.value
#   <chr>       <chr>           <dbl>     <dbl>     <dbl>   <dbl>
# 1 testscore_1 (Intercept)  14.9       14.5       1.03    0.339 
# 2 testscore_1 age           0.0404     0.222     0.182   0.860 
# 3 testscore_1 sex          -1.47       5.09     -0.289   0.781 
# 4 testscore_1 education     1.42       1.96      0.725   0.492 
# 5 testscore_2 (Intercept)   1.83       4.93      0.371   0.721 
# 6 testscore_2 age           0.00423    0.0752    0.0562  0.957 
# 7 testscore_2 sex           1.93       1.73      1.12    0.301 
# 8 testscore_2 education     0.432      0.664     0.651   0.536 
# 9 testscore_3 (Intercept)   5.43       6.34      0.857   0.420 
#10 testscore_3 age           0.192      0.0969    1.98    0.0882
#11 testscore_3 sex           4.57       2.23      2.05    0.0794
#12 testscore_3 education    -0.359      0.856    -0.420   0.687

来自?lm

如果响应是矩阵，则将线性模型以最小二乘法分别拟合到矩阵的每一列。

由于您要处理的变量更多，请尝试

y <- as.matrix(df[startsWith(names(df), "testscore")])
lm_results <- lm(y ~ age + sex + education, data = df)

假设所有因变量的名称均以“ testscore”开头。

Answer 2

类似于@markus的答案，使用broom包，但通过管道进行。

require(tidyverse)
require(broom)

df %>% 
  gather(var, value, -ID, -age, -sex, -education) %>% 
  nest(-var) %>% 
  mutate(model = purrr::map(data, function(x) { 
    lm(value ~ age + sex + education, data = x)}), 
    values = purrr::map(model, tidy)) %>% 
  select(-data)%>%
  unnest(values)


          var        term     estimate   std.error   statistic    p.value
1  testscore_1 (Intercept) 14.899383690 14.50707597  1.02704251 0.33857568
2  testscore_1         age  0.040404308  0.22161068  0.18232112 0.86049842
3  testscore_1         sex -1.472779643  5.09169384 -0.28925141 0.78076814
4  testscore_1   education  1.419080194  1.95702802  0.72512002 0.49190076
5  testscore_2 (Intercept)  1.829852912  4.92563999  0.37149546 0.72125796
6  testscore_2         age  0.004230513  0.07524428  0.05622371 0.95673475
7  testscore_2         sex  1.931496405  1.72880123  1.11724608 0.30076331
8  testscore_2   education  0.432491820  0.66447680  0.65087572 0.53589927
9  testscore_3 (Intercept)  5.434355575  6.34277671  0.85677864 0.41992820
10 testscore_3         age  0.191860896  0.09689251  1.98014159 0.08816340
11 testscore_3         sex  4.565962111  2.22618791  2.05102278 0.07941042
12 testscore_3   education -0.359482384  0.85565084 -0.42012743 0.68698792

如何创建函数以对一系列变量进行回归并提取模型估算值：系数，p值？

2 个答案: