Question

感谢此帖regarding the failure of stepwise variable selection in lm

我有一个数据，例如下面的内容，如该帖子所述

set.seed(1)            # for reproducible example
x <- sample(1:500,500) # need this so predictors are not perfectly correlated.
x <- matrix(x,nc=5)    # 100 rows, 5 cols
y <- 1+ 3*x[,1]+2*x[,2]+4*x[,5]+rnorm(100)  # y depends on variables 1, 2, 5 only

# you start here...
df <- data.frame(y,as.matrix(x))
full.model <- lm(y ~ ., df)                 # include all predictors
step(full.model,direction="backward")

我需要的是在这20个中只选择5个最佳变量然后选择6个最佳变量，是否有人知道如何制造这种污染物？

Answer 1

MuMIn::dredge()可以选择有关术语数量的限制。
[注意]：组合的数量，所需的时间，随预测变量的数量呈指数增长。

set.seed(1)            # for reproducible example
x <- sample(100*20)
x <- matrix(x, nc = 20)     # 20 predictor
y <- 1 + 2*x[,1] + 3*x[,2] + 4*x[,3] + 5*x[,7] + 6*x[,8] + 7*x[,9] + rnorm(100)  # y depends on variables 1,2,3,7,8,9 only

df <- data.frame(y, as.matrix(x))
full.model <- lm(y ~ ., df)                 # include all predictors

library(MuMIn)

# options(na.action = "na.fail")       # trace = 2: a progress bar is displayed
dredge(full.model, m.lim = c(5, 5), trace = 2)          # result: x2, x3, x7, x8, x9

如何设置Step包的门槛？

1 个答案: