这是示例数据。
ind1 <- rnorm(99)
ind2 <- rnorm(99)
ind3 <- rnorm(99)
ind4 <- rnorm(99)
ind5 <- rnorm(99)
dep <- rnorm(99, mean=ind1)
group <- rep(c("A", "B", "C"), each=33)
df <- data.frame(dep,group, ind1, ind2, ind3, ind4, ind5)
在将df中的每个变量组合按类别变量分组之后,这里已经拟合了简单的线性回归模型。结果令人满意。但是我的原始数据有5个以上的变量。很难看到和比较此列表中的结果。因此,我想根据AIC值从结果列表(tibble_list)中为每个组选择最佳的5种模型。如果有人可以帮助我,将不胜感激。
indvar_list <- lapply(1:5, function(x)
combn(paste0("ind", 1:5), x, , simplify = FALSE))
formulas_list <- rapply(indvar_list, function(x)
as.formula(paste("dep ~", paste(x, collapse="+"))))
run_model <- function(f) {
df %>%
nest(-group) %>%
mutate(fit = map(data, ~ lm(f, data = .)),
results1 = map(fit, glance),
results2 = map(fit, tidy)) %>%
unnest(results1) %>%
unnest(results2) %>%
select(group, term, estimate, r.squared, p.value, AIC) %>%
mutate(estimate = exp(estimate))
}
tibble_list <- lapply(formulas_list, run_model)
tibble_list
答案 0 :(得分:1)
一种选择是将行绑定到具有.id
列的单个数据集中,然后将arrange
按“组”,“ AIC”,按“组”,filter
分组具有前五个unique
'索引'
library(tidyverse)
bind_rows(tibble_list, .id = 'index') %>%
arrange(group, AIC) %>%
group_by(group) %>%
filter(index %in% head(unique(index), 5))
# A tibble: 51 x 7
# Groups: group [3]
# index group term estimate r.squared p.value AIC
# <chr> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 1 A (Intercept) 0.897 0.319 0.000620 79.5
# 2 1 A ind1 2.07 0.319 0.000620 79.5
# 3 7 A (Intercept) 0.883 0.358 0.00129 79.5
# 4 7 A ind1 2.14 0.358 0.00129 79.5
# 5 7 A ind3 0.849 0.358 0.00129 79.5
# 6 8 A (Intercept) 0.890 0.351 0.00153 79.9
# 7 8 A ind1 2.12 0.351 0.00153 79.9
# 8 8 A ind4 0.860 0.351 0.00153 79.9
# 9 19 A (Intercept) 0.877 0.387 0.00237 80.0
#10 19 A ind1 2.18 0.387 0.00237 80.0
## … with 41 more rows