感谢此帖regarding the failure of stepwise variable selection in lm
我有一个数据,例如下面的内容,如该帖子所述
set.seed(1) # for reproducible example
x <- sample(1:500,500) # need this so predictors are not perfectly correlated.
x <- matrix(x,nc=5) # 100 rows, 5 cols
y <- 1+ 3*x[,1]+2*x[,2]+4*x[,5]+rnorm(100) # y depends on variables 1, 2, 5 only
# you start here...
df <- data.frame(y,as.matrix(x))
full.model <- lm(y ~ ., df) # include all predictors
step(full.model,direction="backward")
我需要的是在这20个中只选择5个最佳变量然后选择6个最佳变量,是否有人知道如何制造这种污染物?
答案 0 :(得分:0)
MuMIn::dredge()
可以选择有关术语数量的限制。
[注意]:组合的数量,所需的时间,随预测变量的数量呈指数增长。
set.seed(1) # for reproducible example
x <- sample(100*20)
x <- matrix(x, nc = 20) # 20 predictor
y <- 1 + 2*x[,1] + 3*x[,2] + 4*x[,3] + 5*x[,7] + 6*x[,8] + 7*x[,9] + rnorm(100) # y depends on variables 1,2,3,7,8,9 only
df <- data.frame(y, as.matrix(x))
full.model <- lm(y ~ ., df) # include all predictors
library(MuMIn)
# options(na.action = "na.fail") # trace = 2: a progress bar is displayed
dredge(full.model, m.lim = c(5, 5), trace = 2) # result: x2, x3, x7, x8, x9