使用group by和tidy运行多个模型并将结果提取到数据帧

时间:2018-05-17 11:41:51

标签: r linear-regression data-manipulation

我想使用group_by %>% do(tidy(*))运行多个线性回归模型并将模型结果提取到数据框。每个模型的数据框应包括以下内容:结果变量,暴露变量,样本大小,β系数,SE和p值。

library(tidyverse)
data("mtcars")
outcomes <- c("wt, mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

模型应针对所有协变量调整每次曝光的每个结果。

lm(wt ~ factor(gear)+drat+qsec, mtcars, na.action = na.omit)
lm(wt ~ factor(vs)+drat+qsec, mtcars, na.action = na.omit)
etc...

最终代码可能看起来像这样?

models <- (mtcars %>%
gather(x_var, x_value, -c(y_var, y_i, cv1:cv3)) %>%
group_by(y_var, x_var) %>%
do(broom::tidy(lm(y_i ~ x_value + cv1 + cv2 + cv3, data = .))))

1 个答案:

答案 0 :(得分:5)

这是一个解决方案,首先为您要运行的每个模型创建公式,然后从您要分析的数据集中调用正确的变量,而不是重新整形数据集本身并应用模型:

library(tidyverse)
library(broom)

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariates <- c("drat", "qsec")

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number(),
           frm) %>%
  do(tidy(lm(.$frm, data = mtcars))) %>%
  ungroup()

# # A tibble: 52 x 7
#   model_id frm                       term          estimate std.error statistic     p.value
#      <int> <chr>                     <chr>            <dbl>     <dbl>     <dbl>       <dbl>
# 1        1 wt~factor(gear)+drat+qsec (Intercept)      9.25     2.17       4.27  0.000218   
# 2        1 wt~factor(gear)+drat+qsec factor(gear)4   -0.187    0.493     -0.378 0.708      
# 3        1 wt~factor(gear)+drat+qsec factor(gear)5   -0.703    0.518     -1.36  0.186      
# 4        1 wt~factor(gear)+drat+qsec drat            -1.03     0.425     -2.42  0.0227     
# 5        1 wt~factor(gear)+drat+qsec qsec            -0.121    0.0912    -1.32  0.196      
# 6        2 wt~factor(vs)+drat+qsec   (Intercept)      4.35     2.28       1.91  0.0663     
# 7        2 wt~factor(vs)+drat+qsec   factor(vs)1     -1.04     0.416     -2.49  0.0189     
# 8        2 wt~factor(vs)+drat+qsec   drat            -0.918    0.263     -3.49  0.00160    
# 9        2 wt~factor(vs)+drat+qsec   qsec             0.147    0.106      1.39  0.175      
# 10        3 wt~factor(am)+drat+qsec   (Intercept)      8.29     1.31       6.33  0.000000766
# # ... with 42 more rows

如果您希望使用map个包裹中的purrr代替do,则非常类似的过程:

expand.grid(outcomes, exposures, covariates) %>%
  group_by(Var1, Var2) %>%
  summarise(Var3 = paste0(Var3, collapse = "+")) %>%
  rowwise() %>%
  summarise(frm = paste0(Var1, "~factor(", Var2, ")+", Var3)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()

请记住,这种方法的关键是创建公式。 因此,如果您设法以稍微不同的方式指定变量并使用比以前更少的代码创建公式,代码将变得更简单:

outcomes <- c("wt", "mpg", "hp", "disp")
exposures <- c("gear", "vs", "am")
covariate1 <- "drat"
covariate2 <- "qsec"

expand.grid(outcomes, exposures, covariate1, covariate2) %>%
  transmute(frm = paste0(Var1, "~factor(", Var2, ")+", Var3, "+", Var4)) %>%
  group_by(model_id = row_number()) %>%
  mutate(model = map(frm, ~tidy(lm(., data = mtcars)))) %>%
  unnest() %>%
  ungroup()