折叠交叉验证方法来比较使用前向选择拟合的19个模型的残差。我被困在最后一步。 代码是这样的:
library(ISLR)
summary(Hitters) # Use the dataset of Hitters
Hitters = na.omit(Hitters)
library(leaps)
set.seed(11)
folds = sample(rep(1:10,length=nrow(Hitters))) # used for cross validation later
table(folds)
cv.errors = matrix(NA,10,19)
# store the errors from 10 validations, each contains an error for a model
# write a prediction function
predict.regsubsets = function(object,newdata,id,...){
form = as.formula(object$call[[2]]) # extract the formula
mat = model.matrix(form,newdata) # extract the exploratory data
coefi = coef(object,id=id) # coefficients for the ith model
return(mat[,names(coefi)]%*%coefi) # manually get the predicted value
}
# write a function to extract the Mean of squared root of residuals
error = function(object,newdata,origin,num,...){
pred = lapply(seq_along(1:num),function(x){predict.regsubsets(object,newdata,id=x)})
sapply(pred,function(x){mean((x-origin)^2)})
}
# this gives error: $ operator is invalid for atomic vectors
lapply(seq_along(1:10),function(X){
best.fit = regsubsets(Salary~.,data=Hitters[folds!=X,],nvmax=19,method="forward")
cv.errors[X,]=error(best.fit,newdata=Hitters[folds==X,],origin=Hitters$Salary[folds==X],num=19)
})
# this works well, except for being slow...
for(X in 1:10){
best.fit = regsubsets(Salary~.,data=Hitters[folds!=X,],nvmax=19,method="forward")
cv.errors[X,]=error(best.fit,newdata=Hitters[folds==X,],origin=Hitters$Salary[folds==X],num=19)
}
谢谢!
答案 0 :(得分:0)
这可能是一个范围问题。没试过,但请尝试以下方法:
lapply(1:10,function(X){
best.fit = regsubsets(Salary~.,data=Hitters[folds!=X,],nvmax=19,method="forward")
cv.errors[X,] <- error(best.fit,newdata=Hitters[folds==X,],origin=Hitters$Salary[folds==X],num=19)
})
此处唯一的变化是assignment operator。
此外,不确定这是否可以解决您的问题,但是:您真的需要在这里使用seq_along
吗?只需1:10
即可。