我需要估计同一数据集上的许多线性模型,并将回归结果全部放入一张表中。对于可重现的示例,这里使用mtcars
进行简化:
formula_1 = "mpg ~ disp"
formula_2 = "mpg ~ log(disp)"
formula_3 = "mpg ~ disp + hp"
目前,我的方法是:
purrr:map()
估算所有lm
模型。stargazer::
生成输出表。library(tidyverse)
library(stargazer)
formula_1 = "mpg ~ disp"
formula_2 = "mpg ~ log(disp)"
formula_3 = "mpg ~ disp + hp"
lst <- list(formula_1, formula_2, formula_3)
models<- lst %>% map(~lm(., mtcars))
stargazer(models, type = "text")
哪一个给我我想要的输出?
#>
#> =========================================================================================
#> Dependent variable:
#> ---------------------------------------------------------------------
#> mpg
#> (1) (2) (3)
#> -----------------------------------------------------------------------------------------
#> disp -0.041*** -0.030***
#> (0.005) (0.007)
#>
#> log(disp) -9.293***
#> (0.787)
#>
#> hp -0.025*
#> (0.013)
#>
#> Constant 29.600*** 69.205*** 30.736***
#> (1.230) (4.185) (1.332)
#>
#> -----------------------------------------------------------------------------------------
#> Observations 32 32 32
#> R2 0.718 0.823 0.748
#> Adjusted R2 0.709 0.817 0.731
#> Residual Std. Error 3.251 (df = 30) 2.579 (df = 30) 3.127 (df = 29)
#> F Statistic 76.513*** (df = 1; 30) 139.350*** (df = 1; 30) 43.095*** (df = 2; 29)
#> =========================================================================================
#> Note: *p<0.1; **p<0.05; ***p<0.01
当有许多公式时,如何将所有公式放入列表中?如果只有3个公式,下面的行将起作用,但是当要估计的模型很多时,它显得笨拙。
lst <- list(formula_1, formula_2, formula_3)
是否有使用broom
或其他方法更好的方式来完成整个任务?还是purrr:map()
是一个合理的解决方案?
答案 0 :(得分:2)
这是我建议的工作流程。我们可以使用嵌套的tibble
来构造数据,并使用broom
获得整洁的估计值和拟合值:
library(tidyverse)
library(broom)
# Created nested tibble
nested_df <- tibble(formula = c("mpg ~ disp", "mpg ~ log(disp)", "mpg ~ disp + hp")) %>%
group_by(ID = formula) %>%
group_modify(~ as_tibble(mtcars)) %>%
nest()
# Get model estimates
nested_df %>%
mutate(estimates = data %>% map2(ID, ~ tidy(lm(.y, data = .x)))) %>%
select(-data) %>%
unnest()
# Get fitted values and residuals
nested_df %>%
mutate(model = ID %>% map2(data, lm),
stats = model %>% map(augment)) %>%
select(-data, -model) %>%
unnest()
输出:
> nested_df
# A tibble: 3 x 2
ID data
<chr> <list>
1 mpg ~ disp <tibble [32 x 11]>
2 mpg ~ disp + hp <tibble [32 x 11]>
3 mpg ~ log(disp) <tibble [32 x 11]>
# A tibble: 7 x 6
ID term estimate std.error statistic p.value
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 mpg ~ disp (Intercept) 29.6 1.23 24.1 3.58e-21
2 mpg ~ disp disp -0.0412 0.00471 -8.75 9.38e-10
3 mpg ~ disp + hp (Intercept) 30.7 1.33 23.1 3.26e-20
4 mpg ~ disp + hp disp -0.0303 0.00740 -4.10 3.06e- 4
5 mpg ~ disp + hp hp -0.0248 0.0134 -1.86 7.37e- 2
6 mpg ~ log(disp) (Intercept) 69.2 4.19 16.5 1.28e-16
7 mpg ~ log(disp) log(disp) -9.29 0.787 -11.8 8.40e-13
# A tibble: 96 x 12
ID mpg disp .fitted .se.fit .resid .hat .sigma .cooksd .std.resid hp log.disp.
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mpg ~ disp 21 160 23.0 0.664 -2.01 0.0418 3.29 0.00865 -0.630 NA NA
2 mpg ~ disp 21 160 23.0 0.664 -2.01 0.0418 3.29 0.00865 -0.630 NA NA
3 mpg ~ disp 22.8 108 25.1 0.815 -2.35 0.0629 3.28 0.0187 -0.746 NA NA
4 mpg ~ disp 21.4 258 19.0 0.589 2.43 0.0328 3.27 0.00983 0.761 NA NA
5 mpg ~ disp 18.7 360 14.8 0.838 3.94 0.0663 3.22 0.0558 1.25 NA NA
6 mpg ~ disp 18.1 225 20.3 0.575 -2.23 0.0313 3.28 0.00782 -0.696 NA NA
7 mpg ~ disp 14.3 360 14.8 0.838 -0.462 0.0663 3.31 0.000770 -0.147 NA NA
8 mpg ~ disp 24.4 147. 23.6 0.698 0.846 0.0461 3.30 0.00172 0.267 NA NA
9 mpg ~ disp 22.8 141. 23.8 0.714 -0.997 0.0482 3.30 0.00250 -0.314 NA NA
10 mpg ~ disp 19.2 168. 22.7 0.647 -3.49 0.0396 3.24 0.0248 -1.10 NA NA
# ... with 86 more rows
如果您更喜欢stargazer
表,我们也可以pull
model
列表-列:
library(stargazer)
nested_df %>%
mutate(model = ID %>% map2(data, ~ lm(.x, .y))) %>%
pull(model) %>%
stargazer(type = "text")
输出:
=========================================================================================
Dependent variable:
---------------------------------------------------------------------
mpg
(1) (2) (3)
-----------------------------------------------------------------------------------------
disp -0.041*** -0.030***
(0.005) (0.007)
hp -0.025*
(0.013)
log(disp) -9.293***
(0.787)
Constant 29.600*** 30.736*** 69.205***
(1.230) (1.332) (4.185)
-----------------------------------------------------------------------------------------
Observations 32 32 32
R2 0.718 0.748 0.823
Adjusted R2 0.709 0.731 0.817
Residual Std. Error 3.251 (df = 30) 3.127 (df = 29) 2.579 (df = 30)
F Statistic 76.513*** (df = 1; 30) 43.095*** (df = 2; 29) 139.350*** (df = 1; 30)
=========================================================================================
Note: *p<0.1; **p<0.05; ***p<0.01
请注意,group_modify
目前仍处于试验阶段,因此请谨慎使用,因为其性质和意图将来可能会发生变化。
也请参阅我的其他答案以解决相关问题:Place results of predict() in a for loop inside a list
答案 1 :(得分:1)
如果您已经将公式保存在全局环境中,则可以列出如下列表:
formula_list<-do.call("list",mget(grep("formula",names(.GlobalEnv),value=TRUE)))
为您提供:
#> $formula_1
#> [1] "mpg ~ disp"
#>
#> $formula_2
#> [1] "mpg ~ log(disp)"
#>
#> $formula_3
#> [1] "mpg ~ disp + hp"
我认为使用purrr::map
是一个好方法。但是其他人可能有更好的主意。