我目前正在对一系列因变量(将近200个)进行多次线性回归分析,并希望创建一个函数来对一组指定的列运行该函数,然后提取相关模型估算值,例如Beta系数和p值。
模拟数据:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011),
age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
education = as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))
看起来像:
ID age sex testscore_1 testscore_2 testscore_3 education
1 1001 56 0 23 1 18 5
2 1002 43 1 28 3 20 4
3 1003 59 0 30 2 19 3
4 1004 74 0 15 5 15 5
5 1005 61 1 7 8 20 2
6 1006 62 1 18 2 23 1
7 1007 69 0 29 5 19 4
8 1008 80 1 27 6 25 4
9 1009 40 0 14 7 10 3
10 1010 55 1 22 8 14 5
11 1011 58 0 24 2 12 2
我处于一个可以正常工作的阶段:
lm_results <- lapply(df[,4:6], function(x) lm(x ~ age + sex + education,
data = df))
我可以从中得出系数估计值:
Coefficient <- data.frame(coefficients = sapply(lm_results, getElement, name = "coefficients"))
尽管我无法从中得出p值,但哪个返回每个testcore_ *变量中每个预测变量的系数 这些模型:
P_values <- data.frame(p.values = sapply(lm_results, getElement, name = "qr"))
有人对此有任何建议吗?
答案 0 :(得分:3)
实际上,可以使用cbind
和broom::tidy
lm_results <- lm(cbind(testscore_1, testscore_2, testscore_3) ~ age + sex + education, data = df)
broom::tidy(lm_results)
# A tibble: 12 x 6
# response term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 testscore_1 (Intercept) 14.9 14.5 1.03 0.339
# 2 testscore_1 age 0.0404 0.222 0.182 0.860
# 3 testscore_1 sex -1.47 5.09 -0.289 0.781
# 4 testscore_1 education 1.42 1.96 0.725 0.492
# 5 testscore_2 (Intercept) 1.83 4.93 0.371 0.721
# 6 testscore_2 age 0.00423 0.0752 0.0562 0.957
# 7 testscore_2 sex 1.93 1.73 1.12 0.301
# 8 testscore_2 education 0.432 0.664 0.651 0.536
# 9 testscore_3 (Intercept) 5.43 6.34 0.857 0.420
#10 testscore_3 age 0.192 0.0969 1.98 0.0882
#11 testscore_3 sex 4.57 2.23 2.05 0.0794
#12 testscore_3 education -0.359 0.856 -0.420 0.687
来自?lm
如果响应是矩阵,则将线性模型以最小二乘法分别拟合到矩阵的每一列。
由于您要处理的变量更多,请尝试
y <- as.matrix(df[startsWith(names(df), "testscore")])
lm_results <- lm(y ~ age + sex + education, data = df)
假设所有因变量的名称均以“ testscore”开头。
答案 1 :(得分:1)
类似于@markus的答案,使用broom
包,但通过管道进行。
require(tidyverse)
require(broom)
df %>%
gather(var, value, -ID, -age, -sex, -education) %>%
nest(-var) %>%
mutate(model = purrr::map(data, function(x) {
lm(value ~ age + sex + education, data = x)}),
values = purrr::map(model, tidy)) %>%
select(-data)%>%
unnest(values)
var term estimate std.error statistic p.value
1 testscore_1 (Intercept) 14.899383690 14.50707597 1.02704251 0.33857568
2 testscore_1 age 0.040404308 0.22161068 0.18232112 0.86049842
3 testscore_1 sex -1.472779643 5.09169384 -0.28925141 0.78076814
4 testscore_1 education 1.419080194 1.95702802 0.72512002 0.49190076
5 testscore_2 (Intercept) 1.829852912 4.92563999 0.37149546 0.72125796
6 testscore_2 age 0.004230513 0.07524428 0.05622371 0.95673475
7 testscore_2 sex 1.931496405 1.72880123 1.11724608 0.30076331
8 testscore_2 education 0.432491820 0.66447680 0.65087572 0.53589927
9 testscore_3 (Intercept) 5.434355575 6.34277671 0.85677864 0.41992820
10 testscore_3 age 0.191860896 0.09689251 1.98014159 0.08816340
11 testscore_3 sex 4.565962111 2.22618791 2.05102278 0.07941042
12 testscore_3 education -0.359482384 0.85565084 -0.42012743 0.68698792