我今天花了一整天来解决这个问题..请帮助我。 虽然我在这里只写了一个非常简单的例子,但我的原始数据有很多变量 - 大约2,000个。因此,要运行回归,我需要选择某些变量。 我确实需要开发很多模型,所以我应该自动执行这个过程。
选择变量后,我会进行滚动回归预测。
library(car)
library(zoo)
# run regression
m <- lm(mpg~., data=mtcars)
# run stepwise
s<-step(m, direction="both")
# select variables
variable<- attr(s$terms,"term.labels")
b<-paste(dep,paste(s, collapse="+"),sep = "~")
rollapply(mtcars, width = 2,
FUN = function(z) coef(lm(b, data = as.data.frame(z))),
by.column = FALSE, align = "right")
#这是我开发的自动模型..
models2 <- lapply(1:11, function(x) {
dep<-names(mtcars)[x]
ind<-mtcars[-x]
w<-names(ind)
indep<-paste(dep,paste(w, collapse="+"),sep = "~")
m<-lm(indep,data=mtcars)
s<-step(m, direction="both")
b<-paste(dep,paste(s, collapse="+"),sep = "~")
rollapply(mtcars, width = 2,
FUN = function(z) coef(lm(b, data = as.data.frame(z))),
by.column = FALSE, align = "right")})
我想从滚动回归计算预测。
但是,设置起来非常困难 data.frame没有关于自变量的预先知识..
There is a similar one here, but in this model independent variables are known already.
答案 0 :(得分:0)
您不需要知道自变量!如果您提供包含所有变量的data.frame
,predict
函数将选择必要的变量。与您链接的帖子类似,您可以这样:
mtcars[,"int"] <- seq(nrow(mtcars)) # add variable used to choose newdata
models2 <- lapply(1:11, function(x) {
dep <- names(mtcars)[x]
ind <- mtcars[-x]
w <- names(ind)
form <- paste(dep,paste(w, collapse="+"),sep = "~")
m <- lm(form, data=mtcars)
s <- step(m, direction="both", trace=0) # model selection (don't print trace)
b <- formula(s) # This is clearer than your version
rpl <- rollapply(mtcars, width = 20, # if you use width=2, your model will always be overdetermined
FUN = function(z) {
nextD <- max(z[,'int'])+1 # index of row for new data
fit <- lm(b, data = as.data.frame(z)) # fit the model
c(coef=coef(fit), # coefficients
predicted=predict(fit, newdata=mtcars[nextD,])) # predict using the next row
},
by.column = FALSE, align = "right")
rpl
})