我正在尝试使用glmnet
和caret
包将Lasso回归与交叉验证的lambda拟合。我的代码是,
dim(x)
# 121755 465
dim(y)
# 121755 1
### cv.glmnet
set.seed(2108)
cl <- makePSOCKcluster(detectCores()-2,outfile="")
registerDoParallel(cl)
system.time(
las.glm <- cv.glmnet(x=x, y=y,alpha=1,type.measure="mse",parallel = TRUE,
nfolds = 5, lambda = seq(0.001,0.1,by = 0.001),
standardize=F)
)
stopCluster(cl)
# user system elapsed
# 17.98 2.28 37.23
### caret
caretctrl <- trainControl(method = "cv", number = 5)
tune <- expand.grid(alpha=1,lambda = seq(0.001,0.1,by = 0.001))
set.seed(2108)
cl <- makePSOCKcluster(detectCores()-2,outfile="")
registerDoParallel(cl)
system.time(
las.car <- train(x=x, y=as.numeric(y),alpha=1,method="glmnet",
metric="RMSE", allowParallel = TRUE,
trControl = caretctrl, tuneGrid = tune)
)
stopCluster(cl)
# error
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared MAE
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :100 NA's :100 NA's :100
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, :
There were missing values in resampled performance measures.
Timing stopped at: 3.97 1.37 127.9
我知道这可能是由于其中一个重采样中没有足够的数据,但是我怀疑这应该是我的数据大小和只有5倍的问题。我尝试了以下不适用于我的解决方案:
allowParallel
我认为caret
正在执行glmnet
未执行的其他一些重采样导致错误。有人可以阐明这个问题吗?
编辑1 x是一个210个指标和255个连续变量的半稀疏矩阵。
答案 0 :(得分:1)
我认为,大多数问题都是通过一个示例在NULL
中再次设置alpha = 1引起的。因此,即使您的x,y稀疏,它也可以工作:
train
因此cv.glmnet可以工作,现在,如果我们尝试您的代码,它将返回错误:
library(glmnet)
library(caret)
library(Matrix)
dat = Matrix(as.matrix(mtcars),sparse=TRUE)
x = as.matrix(mtcars[,-1])
y = as.matrix(mtcars[,1])
L = seq(0.001,0.1,by = 0.02)
las.glm <- cv.glmnet(x=x, y=y,alpha=1,type.measure="mse",nfolds = 5, lambda = L,standardize=FALSE)
删除alpha参数:
caretctrl <- trainControl(method = "cv", number = 5)
tune <- expand.grid(alpha=1,lambda = L)
las.car <- train(x=x, y=as.numeric(y),alpha=1,method="glmnet",
metric="RMSE",trControl = caretctrl, tuneGrid = tune)
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared MAE
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
它也可以用于密集矩阵。