Question

这是示例数据。

ind1 <- rnorm(99)
ind2 <- rnorm(99)
ind3 <- rnorm(99)
ind4 <- rnorm(99)
ind5 <- rnorm(99)
dep <- rnorm(99, mean=ind1)
group <- rep(c("A", "B", "C"), each=33)
df <- data.frame(dep,group, ind1, ind2, ind3, ind4, ind5)

在将df中的每个变量组合按类别变量分组之后，这里已经拟合了简单的线性回归模型。结果令人满意。但是我的原始数据有5个以上的变量。很难看到和比较此列表中的结果。因此，我想根据AIC值从结果列表（tibble_list）中为每个组选择最佳的5种模型。如果有人可以帮助我，将不胜感激。

indvar_list <- lapply(1:5, function(x) 
  combn(paste0("ind", 1:5), x, , simplify = FALSE))

formulas_list <- rapply(indvar_list, function(x)
  as.formula(paste("dep ~", paste(x, collapse="+"))))

run_model <- function(f) {    
  df %>% 
    nest(-group) %>% 
    mutate(fit = map(data, ~ lm(f, data = .)),
           results1 = map(fit, glance),
           results2 = map(fit, tidy)) %>% 
    unnest(results1) %>% 
    unnest(results2) %>% 
    select(group, term, estimate, r.squared, p.value, AIC) %>% 
    mutate(estimate = exp(estimate))
}

tibble_list <- lapply(formulas_list, run_model)
tibble_list

Answer 1

一种选择是将行绑定到具有.id列的单个数据集中，然后将arrange按“组”，“ AIC”，按“组”，filter分组具有前五个unique'索引'

的行

library(tidyverse)
bind_rows(tibble_list, .id = 'index') %>% 
    arrange(group, AIC) %>% 
    group_by(group) %>% 
    filter(index %in% head(unique(index), 5)) 
# A tibble: 51 x 7
# Groups:   group [3]
#   index group term        estimate r.squared  p.value   AIC
#   <chr> <fct> <chr>          <dbl>     <dbl>    <dbl> <dbl>
# 1 1     A     (Intercept)    0.897     0.319 0.000620  79.5
# 2 1     A     ind1           2.07      0.319 0.000620  79.5
# 3 7     A     (Intercept)    0.883     0.358 0.00129   79.5
# 4 7     A     ind1           2.14      0.358 0.00129   79.5
# 5 7     A     ind3           0.849     0.358 0.00129   79.5
# 6 8     A     (Intercept)    0.890     0.351 0.00153   79.9
# 7 8     A     ind1           2.12      0.351 0.00153   79.9
# 8 8     A     ind4           0.860     0.351 0.00153   79.9
# 9 19    A     (Intercept)    0.877     0.387 0.00237   80.0
#10 19    A     ind1           2.18      0.387 0.00237   80.0
## … with 41 more rows

排名并选择显示各组最佳结果的5个模型

1 个答案: