我正在进行分析,我在两个步骤之间选择变量。步骤1:从两组变量(例如,内在和外在变量)中的每一组中选择最佳变量和变量组合。步骤2:获取每个子集的最佳变量组合,并创建新的模型集,这些模型仅具有每个子集中预选的变量所允许的组合。我在dredge
包中使用了MuMin
函数。
我可以轻松地手动配置变量组合,如下所示。
步骤1:为每组变量选择变量组合
library(MuMIn)
data(Cement)
# Choosing between X1 & X2 - Set of variable #1
fm1 <- lm(y ~ X1 + X2, Cement, na.action = na.fail)
m1 <- dredge(fm1)
sm1 <- subset(m1, delta < 32) # delta < 32 is only chosen to have 2 selected models
sm1
# Global model call: lm(formula = y ~ X1 + X2, data = Cement, na.action = na.fail)
# ---
# Model selection table
# (Intrc) X1 X2 df logLik AICc delta weight
# 4 52.58 1.468 0.6623 4 -28.156 69.3 0.00 1
# 3 57.42 0.7891 3 -46.035 100.7 31.42 0
# Choosing between X3 & X4 - Set of variable #2
fm2 <- lm(y ~ X3 + X4, Cement, na.action = na.fail)
m2 <- dredge(fm2)
sm2 <- subset(m2, delta < 20)
sm2
# Global model call: lm(formula = y ~ X3 + X4, data = Cement, na.action = na.fail)
# ---
# Model selection table
# (Intrc) X3 X4 df logLik AICc delta weight
# 4 131.3 -1.2 -0.7246 4 -35.372 83.7 0.00 1
# 3 117.6 -0.7382 3 -45.872 100.4 16.67 0
步骤2:查看具有两组变量的模型,但仅包括上面为每组选择的组合。
# Only looking at the combinations chosen above with subset.
fm3 <- lm(y ~., Cement, na.action = na.fail)
m3 <- dredge(fm3, subset = ((X1 & X2) | X2) & ((X3 & X4) | X4))
m3
# Global model call: lm(formula = y ~ ., data = Cement, na.action = na.fail)
# ---
# Model selection table
# (Intrc) X1 X2 X3 X4 df logLik AICc delta weight
# 12 71.65 1.452 0.4161 -0.2365 5 -26.933 72.4 0.00 0.921
# 15 203.60 -0.9234 -1.4480 -1.5570 5 -29.734 78.0 5.60 0.056
# 16 62.41 1.551 0.5102 0.1019 -0.1441 6 -26.918 79.8 7.40 0.023
# 11 94.16 0.3109 -0.4569 4 -45.761 104.5 32.08 0.000
当你只有几个变量时,使用subset
很有效,但在我的情况下,我在每个集合中选择了更多的变量。
有没有办法在不必手动指定子集中的变量的情况下做同样的事情?
非常感谢!
答案 0 :(得分:2)
这不是一个最好的解决方案,但可以做你想要的。
library(MuMIn)
options(na.action = na.fail)
fm1 <- lm(y ~ X1 + X2, Cement)
m1 <- dredge(fm1)
ms1 <- subset(m1, delta < 32)
fm2 <- lm(y ~ X3 + X4, Cement)
m2 <- dredge(fm2)
ms2 <- subset(m2, delta < 20)
a1 <- !is.na(ms1[, attr(ms1, "terms")])
a2 <- !is.na(ms2[, attr(ms2, "terms")])
allterms <- c(attr(ms1, "terms"), attr(ms2, "terms"))
allterms[allterms == "(Intercept)"] <- "1"
n1 <- nrow(a1)
n2 <- nrow(a2)
res <- vector("list", n1 * n2)
k <- 0L
for(i in 1L:n1) for(j in 1L:n2) {
frm <- reformulate(allterms[c(a1[i, ], a2[j, ])], response = ".")
res[[k <- k + 1L]] <- update(fm1, formula = frm)
}
model.sel(res)
# Model selection table
# (Intrc) X1 X2 X3 X4 df logLik AICc delta weight
# 4 52.58 1.468 0.6623 4 -28.156 69.3 0.00 0.566
# 2 71.65 1.452 0.4161 -0.2365 5 -26.933 72.4 3.13 0.119
# 3 48.19 1.696 0.6569 0.2500 5 -26.952 72.5 3.16 0.116
# 10 103.10 1.440 -0.6140 4 -29.817 72.6 3.32 0.107
# ...