我正在尝试使用略有不同的公式一次运行许多多元回归。我在这里找到了一个很好的例子:https://rpubs.com/Marcelobn/many_regressions
但是,我无法完全为每个回归运行不同的公式...我正在寻求帮助来修复更新的代码或提供替代方法。预先谢谢你!
我正在使用R Studio,并在下面突出显示了我已经尝试过的内容(示例2)。
library(pwt)
library(dplyr)
library(tidyr)
library(purrr)
library(broom)
library(pander)
example <- pwt7.1
# This works great, and I still want an output like this:
multiple_growth <- example %>% select(country, openc, cg, cgdp) %>%
na.omit() %>%
nest(-country) %>%
mutate(model = map(data, ~lm(cgdp ~ openc + cg, data = .)),
tidied = map(model, tidy)) %>%
unnest(tidied)
# BUT: it assumes each of the models for each country are the same
# I want to specify different formulas for each one
example2 <- example
# I have randomly assigned them for the purpose of this example
# In reality I get to this a more methodical way!
formula1 <- paste("cgdp", "~", "openc", "+", "cg", sep = " ")
formula2 <- paste("cgdp", "~", "openc", "+", "cg", "+", "currency", "+", "ppp", sep = " ")
formula3 <- paste("cgdp", "~", "pg", "+", "kg", "+", "openc", sep = " ")
randvar = sample(c(formula1,formula2,formula3), size = nrow(example2), replace = TRUE)
example2$regress = randvar
# Run model again with slight change to lm, and it kind of works
multiple_growth_2 <- example2 %>% select(country, openc, cg, cgdp, currency, ppp, pg, kg, regress) %>%
na.omit() %>%
nest(-country, -regress) %>%
mutate(model = map(data, ~lm(as.formula(regress), data = .)), # here is where i have tried to change it
tidied = map(model, tidy)) %>%
unnest(tidied)
# This kind of works but it uses the first formula for ALL of the other countries... Any idea how to fix / an alternate method?
我想要一个类似的输出,但是对于每个变量,使用正确的公式进行回归,而不仅仅是所有列表中的第一个...
答案 0 :(得分:1)
使用map2
遍历公式和数据框:
multiple_growth_2 <- example2 %>%
select(country, openc, cg, cgdp, currency, ppp, pg, kg, regress) %>%
na.omit() %>%
nest(-country, -regress) %>%
mutate(model = map2(data, regress, ~ lm(as.formula(.y), data = .x)),
tidied = map(model, tidy)) %>%
unnest(tidied)
您还应该从formula2
中删除“货币”。您会嵌套在国家/地区上,因此大多数(如果不是全部)数据框将只包含一种货币,但是至少需要两个要素水平(即货币)才能形成对比。
答案 1 :(得分:1)
由于您正在对整个数据集进行模型训练,因此可以将公式(或模型)选择为单独的对象,并在以后使用tidyr::crossing
添加它们:
library(pwt, quietly = TRUE, warn.conflicts = FALSE)
library(dplyr, quietly = TRUE, warn.conflicts = FALSE)
library(tidyr)
library(purrr)
library(broom)
example <- as_tibble(pwt7.1)
formulas <- c(
formula1 = paste("cgdp", "~", "openc", "+", "cg", sep = " "),
formula2 = paste("cgdp", "~", "openc", "+", "cg", "+", "ppp", sep = " "),
formula3 = paste("cgdp", "~", "pg", "+", "kg", "+", "openc", sep = " ")
)
multiple_growth_2 <- example %>%
select(country, openc, cg, cgdp, currency, ppp, pg, kg) %>%
na.omit() %>%
nest(-country) %>%
tidyr::crossing(. , formulas) %>%
mutate(model = pmap(list(x = data, y = formulas), function(x, y) lm( as.formula(y), data = x))
)
# --- Use broom to
# evaluate models
multiple_growth_2 %>%
mutate(model_glance = map(model, glance) ) %>%
unnest(model_glance) %>%
select(-data, -model)
#> # A tibble: 570 x 13
#> country formulas r.squared adj.r.squared sigma statistic p.value df
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Afghan~ cgdp ~ ~ 0.550 0.527 179. 23.2 2.56e- 7 3
#> 2 Afghan~ cgdp ~ ~ 0.551 0.514 181. 15.1 1.39e- 6 4
#> 3 Afghan~ cgdp ~ ~ 0.599 0.567 171. 18.5 1.74e- 7 4
#> 4 Albania cgdp ~ ~ 0.519 0.494 1247. 20.5 9.17e- 7 3
#> 5 Albania cgdp ~ ~ 0.746 0.726 917. 36.3 4.09e-11 4
#> 6 Albania cgdp ~ ~ 0.626 0.596 1114. 20.7 4.93e- 8 4
#> 7 Algeria cgdp ~ ~ 0.0754 0.0368 1916. 1.96 1.52e- 1 3
#> 8 Algeria cgdp ~ ~ 0.824 0.813 844. 73.5 9.02e-18 4
#> 9 Algeria cgdp ~ ~ 0.482 0.449 1449. 14.6 7.58e- 7 4
#> 10 Angola cgdp ~ ~ 0.581 0.559 971. 26.4 6.56e- 8 3
#> # ... with 560 more rows, and 5 more variables: logLik <dbl>, AIC <dbl>,
#> # BIC <dbl>, deviance <dbl>, df.residual <int>
# check coefficient
multiple_growth_2 %>%
mutate(model_tidy = map(model, tidy) ) %>%
unnest(model_tidy)
#> # A tibble: 2,089 x 7
#> country formulas term estimate std.error statistic p.value
#> <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghanis~ cgdp ~ openc +~ (Inter~ 255. 77.7 3.28 2.21e-3
#> 2 Afghanis~ cgdp ~ openc +~ openc -5.03 1.09 -4.60 4.63e-5
#> 3 Afghanis~ cgdp ~ openc +~ cg 70.0 10.3 6.80 4.55e-8
#> 4 Afghanis~ cgdp ~ openc +~ (Inter~ 230. 130. 1.78 8.38e-2
#> 5 Afghanis~ cgdp ~ openc +~ openc -4.82 1.40 -3.45 1.41e-3
#> 6 Afghanis~ cgdp ~ openc +~ cg 72.7 15.3 4.76 2.92e-5
#> 7 Afghanis~ cgdp ~ openc +~ ppp -1.88 7.79 -0.241 8.11e-1
#> 8 Afghanis~ cgdp ~ pg + kg~ (Inter~ 452. 101. 4.46 7.38e-5
#> 9 Afghanis~ cgdp ~ pg + kg~ pg -6.11 2.40 -2.54 1.53e-2
#> 10 Afghanis~ cgdp ~ pg + kg~ kg 64.2 9.67 6.63 8.76e-8
#> # ... with 2,079 more rows
# check individual prediction
multiple_growth_2 %>%
mutate(model_augment = map(model, augment) ) %>%
unnest(model_augment)
#> # A tibble: 26,820 x 15
#> country formulas cgdp openc cg .fitted .se.fit .resid .hat .sigma
#> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghan~ cgdp ~ ~ 247. 21.7 5.28 515. 42.5 -267. 0.0562 176.
#> 2 Afghan~ cgdp ~ ~ 241. 27.1 5.73 520. 39.3 -278. 0.0481 175.
#> 3 Afghan~ cgdp ~ ~ 240. 32.9 6.11 517. 36.7 -277. 0.0419 176.
#> 4 Afghan~ cgdp ~ ~ 273. 27.7 5.74 518. 39.1 -245. 0.0476 177.
#> 5 Afghan~ cgdp ~ ~ 324. 28.9 5.36 485. 40.7 -160. 0.0517 180.
#> 6 Afghan~ cgdp ~ ~ 363. 26.9 6.99 609. 36.2 -246. 0.0408 177.
#> 7 Afghan~ cgdp ~ ~ 410. 28.1 6.60 576. 36.3 -167. 0.0409 179.
#> 8 Afghan~ cgdp ~ ~ 441. 26.5 6.97 610. 36.4 -169. 0.0413 179.
#> 9 Afghan~ cgdp ~ ~ 487. 24.7 7.08 626. 37.3 -139. 0.0434 180.
#> 10 Afghan~ cgdp ~ ~ 505. 26.4 7.07 617. 36.4 -112. 0.0413 181.
#> # ... with 26,810 more rows, and 5 more variables: .cooksd <dbl>,
#> # .std.resid <dbl>, ppp <dbl>, pg <dbl>, kg <dbl>
注意:我使用purrr::pmap
是为了提供不同的答案(purrr::map2
也可以完成!)。