我想使用group_by %>% do(tidy(*))
运行多个线性回归模型并将模型结果提取到数据框。每个模型的数据框应包括以下内容:结果变量,暴露变量,样本大小,β系数,SE和p值。
library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")
模型应针对所有协变量调整每次曝光的每个结果。
lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...
最终代码可能看起来像这样?
models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))
答案 0 :(得分:5)
这是一个解决方案,首先为您要运行的每个模型创建公式,然后从您要分析的数据集中调用正确的变量,而不是重新整形数据集本身并应用模型:
library(tidyverse)
library(broom)
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")
expand.grid(outcomes, exposures, covariates) %>%
group_by(Var1, Var2) %>%
summarise(Var3 = paste0(Var3, collapse = "+")) %>%
rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
group_by(model_id = row_number(),
frm) %>%
do(tidy(lm(.$frm, data = mtcars))) %>%
ungroup()
# # A tibble: 52 x 7
# model_id frm term estimate std.error statistic p.value
# <int> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 wt~factor(gear)+drat+qsec (Intercept) 9.25 2.17 4.27 0.000218
# 2 1 wt~factor(gear)+drat+qsec factor(gear)4 -0.187 0.493 -0.378 0.708
# 3 1 wt~factor(gear)+drat+qsec factor(gear)5 -0.703 0.518 -1.36 0.186
# 4 1 wt~factor(gear)+drat+qsec drat -1.03 0.425 -2.42 0.0227
# 5 1 wt~factor(gear)+drat+qsec qsec -0.121 0.0912 -1.32 0.196
# 6 2 wt~factor(vs)+drat+qsec (Intercept) 4.35 2.28 1.91 0.0663
# 7 2 wt~factor(vs)+drat+qsec factor(vs)1 -1.04 0.416 -2.49 0.0189
# 8 2 wt~factor(vs)+drat+qsec drat -0.918 0.263 -3.49 0.00160
# 9 2 wt~factor(vs)+drat+qsec qsec 0.147 0.106 1.39 0.175
# 10 3 wt~factor(am)+drat+qsec (Intercept) 8.29 1.31 6.33 0.000000766
# # ... with 42 more rows
如果您希望使用map
个包裹中的purrr
代替do
,则非常类似的过程:
expand.grid(outcomes, exposures, covariates) %>%
group_by(Var1, Var2) %>%
summarise(Var3 = paste0(Var3, collapse = "+")) %>%
rowwise() %>%
summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
group_by(model_id = row_number()) %>%
mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
unnest() %>%
ungroup()
请记住,这种方法的关键是创建公式。 因此,如果您设法以稍微不同的方式指定变量并使用比以前更少的代码创建公式,代码将变得更简单:
outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariate1 <- "drat"
covariate2 <- "qsec"
expand.grid(outcomes, exposures, covariate1, covariate2) %>%
transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
group_by(model_id = row_number()) %>%
mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
unnest() %>%
ungroup()