创建一个与regsubet和ridge / lasso回归一起使用的训练/测试集

时间:2018-04-08 06:33:14

标签: r regression training-data

我正在通过以下链接执行练习6.9:https://notendur.hi.is/map27/ISLR/ISLRChapter6.html

具体来说,教师以这种方式创建训练和测试数据集:

set.seed(1)
trainRows = sample(dim(College)[1], ceiling(dim(College)[1]/2))
train = is.element(c(1:dim(College)[1]),trainRows)
test = !train

现在“train”和“test”对象是逻辑运算符。现在,这些培训和测试对象适用于lmridgelasso

#Linear Regression
fit = lm(Apps~., data=College[train, ])
fit.pred = predict(fit, College[test, ])
mean((College[test, ][, "Apps"] - fit.pred)^2)
#Ridge Regression
trainMat = model.matrix(Apps~., data=College[train, ])
testMat = model.matrix(Apps~., data=College[test, ])
grid = 10 ^ seq(10, -10, length=100)
ridgeModel = cv.glmnet(trainMat, College[train, ][, "Apps"], alpha=0, 
lambda=grid)
optLambda = ridgeModel$lambda.min
optLambda
#Ridge MSE
ridgePred = predict(ridgeModel, newx=testMat, s=optLambda)
mean((College[test, ][, "Apps"] - ridgePred)^2)
#Lasso Model
lassoModel = cv.glmnet(trainMat, College[train, ][, "Apps"], alpha=1, 
lambda=grid)
optLambda = lassoModel$lambda.min
optLambda
#Test MSE - Lasso
lassoPred = predict(lassoModel, newx=testMat, s=optLambda)
mean((College[test, ][, "Apps"] - lassoPred)^2)

但是,当我尝试将这些对象与regsubsets函数一起使用时,我收到以下消息:

regfit.full=regsubsets(Apps ~ ., data = train)

Error in terms.formula(formula, data = data) : '.' in formula and no 'data' argument

但是,当我按照以下方式创建培训/测试对象时(使用ridge <{1>}和lasso回归 工作 包),它与glmnet

一起使用
regsubset

我现在有两次训练&amp;测试集,这显然不理想。有没有办法我只能创建一个可以同时使用index = sample(1:nrow(College), size=0.5*nrow(College)) train_2 = College[index,] test_2 = College[-index,] regsubsets的培训/测试集?

由于

0 个答案:

没有答案