R中是否有方法为具有不同组合的数据框中的每个不同变量运行GLM,例如
如果我有4个解释变量,我可以将Y建模为
m1 = glm(Y ~ V1, data = d)
m2 = glm(Y ~ V1 + V2, data = d)
m3 = glm(Y ~ V1 + V2 + V3, data = d)
m4 = glm(Y ~ V1 + V2 + V3 + V4, data = d)
但是,我也可以
m5 = glm(Y ~ V1 + V2 + V4, data = d)
等等。
R中是否有方法可以选择数据框中所有不同的变量组合,以查看哪些变量可以作为最佳预测变量?
答案 0 :(得分:10)
这称为疏浚:
library(MuMIn)
data(Cement)
fm1 <- lm(y ~ ., data = Cement)
dd <- dredge(fm1)
Global model call: lm(formula = y ~ ., data = Cement)
---
Model selection table
(Intrc) X1 X2 X3 X4 df logLik AICc delta weight
4 52.58 1.468 0.6623 4 -28.156 69.3 0.00 0.566
12 71.65 1.452 0.4161 -0.2365 5 -26.933 72.4 3.13 0.119
8 48.19 1.696 0.6569 0.2500 5 -26.952 72.5 3.16 0.116
10 103.10 1.440 -0.6140 4 -29.817 72.6 3.32 0.107
14 111.70 1.052 -0.4100 -0.6428 5 -27.310 73.2 3.88 0.081
15 203.60 -0.9234 -1.4480 -1.5570 5 -29.734 78.0 8.73 0.007
16 62.41 1.551 0.5102 0.1019 -0.1441 6 -26.918 79.8 10.52 0.003
13 131.30 -1.2000 -0.7246 4 -35.372 83.7 14.43 0.000
7 72.07 0.7313 -1.0080 4 -40.965 94.9 25.62 0.000
9 117.60 -0.7382 3 -45.872 100.4 31.10 0.000
3 57.42 0.7891 3 -46.035 100.7 31.42 0.000
11 94.16 0.3109 -0.4569 4 -45.761 104.5 35.21 0.000
2 81.48 1.869 3 -48.206 105.1 35.77 0.000
6 72.35 2.312 0.4945 4 -48.005 109.0 39.70 0.000
5 110.20 -1.2560 3 -50.980 110.6 41.31 0.000
1 95.42 2 -53.168 111.5 42.22 0.000
答案 1 :(得分:4)
如果您只想使用基础R而没有允许您进行疏浚的软件包,则可以使用combn
函数并列出所有可能的GLM对象:
d <- data.frame(replicate(5, rnorm(10)))
names(d) <- c('Y', paste0('V', 1:4))
dep_var <- 'Y'
indep_vars <- setdiff(names(d), dep_var)
glms <- Reduce(append, lapply(seq_along(indep_vars),
function(num_vars) {
Reduce(append, apply(combn(length(indep_vars), num_vars), 2, function(vars) {
formula_string <- paste(c(dep_var, paste(indep_vars[vars], collapse = "+")), collapse = '~')
structure(list(glm(as.formula(formula_string), data = d)), .Names = formula_string)
}))
}
))
print(names(glms))
# [1] "Y~V1" "Y~V2" "Y~V3" "Y~V4" "Y~V1+V2" "Y~V1+V3" "Y~V1+V4" "Y~V2+V3" "Y~V2+V4" "Y~V3+V4" "Y~V1+V2+V3" "Y~V1+V2+V4"
# [13] "Y~V1+V3+V4" "Y~V2+V3+V4" "Y~V1+V2+V3+V4"
print(glms[["Y~V2+V3+V4"]])
# Call: glm(formula = as.formula(formula_string), data = d)
#
# Coefficients:
# (Intercept) V2 V3 V4
# 0.12721 0.04748 0.11369 -0.04258
# Degrees of Freedom: 9 Total (i.e. Null); 6 Residual
# Null Deviance: 8.932
# Residual Deviance: 8.695 AIC: 36.98