我有一个12个预测变量的数据框和一个名为BEI的数字列表(我想预测)。我想对每12行数据进行逐步选择,例如1:12,2:13等。对于每次滚动,我想返回系数并使用系数来预测BEI。以下是我的代码:
k = length(BEI)
coef.list <- numeric()
predicted.list <- numeric()
for(i in 1:(k-11)){
BEI.subset <- BEI[i:(i+11)]
predictors.subset <- predictors[c(i:(i+11)),]
fit.stepwise <- regsubsets(BEI.subset~., data = predictors.subset, nvmax = 10, method = "forward")
fit.summary <- summary(fit.stepwise)
id <- which.min(fit.summary$cp)
coefficients <- coef(fit.stepwise,id)
coef.list <- append(coef.list, coefficients)
form <- as.formula(fit.stepwise$call[[2]])
mat <- model.matrix(form,predictors.subset)
predicted.stepwise <- mat[,names(coefficients)]%*%coefficients
predicted.list <- append(predicted.list, predicted.stepwise)
}
我得到了这样的错误: 重新排序变量并重试: 有50个或更多警告(使用警告()查看前50个)
警告是: 1:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...: 找到1个线性依赖项 2:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...: 找到1个线性依赖项 3:在leaps.setup(x,y,wt = wt,nbest = nbest,nvmax = nvmax,...: 找到1个线性依赖项 ......等等。
我该如何解决这个问题?或者这是编写代码的更好方法吗?
答案 0 :(得分:0)
您遇到错误的原因是滚动数据子集的缺失值(NA)。
以数据(瑞士)为例:
dim(swiss)
# [1] 47 6
split_swiss <- lapply(1:nrow(swiss), function(x) swiss[x:(x+11),])
length(split_swiss)
# [1] 47 ## rolling subset produce 47 data.frames.
lapply(tail(split_swiss), head) # show the first 6 rows of the last 6 data.frames
[[1]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Neuchatel 64.4 17.6 35 32 16.92 23.0
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
[[2]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Val de Ruz 77.6 37.6 15 7 4.97 20.0
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
[[3]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
ValdeTravers 67.6 18.7 25 7 8.65 19.5
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
[[4]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
V. De Geneve 35.0 1.2 37 53 42.34 18.0
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
[[5]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Droite 44.7 46.6 16 29 50.43 18.2
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
[[6]]
Fertility Agriculture Examination Education Catholic Infant.Mortality
Rive Gauche 42.8 27.7 22 29 58.33 19.3
NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA
如果您要使用这些data.frames运行regsubsets,那么会出现错误,其中有多个预测变量而不是大小写。
lapply(split_swiss, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))
Error in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
y and x different lengths In addition: Warning messages:
1: In leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in = force.in, :
1 linear dependencies found
......
相反,我只能保留12行的子集并继续进行回归:
split_swiss_2 <- split_swiss[sapply(lapply(split_swiss, na.omit), nrow) == 12]
lapply(split_swiss_2, function(x) regsubsets(Fertility ~., data=x, nvmax=10, method="forward"))