R中的Glmnet vs插入符号:插入符号中出现错误,但glmnet中没有

时间:2019-09-11 13:48:27

标签: r r-caret glmnet

我正在尝试使用glmnetcaret包将Lasso回归与交叉验证的lambda拟合。我的代码是,

dim(x)
# 121755    465
dim(y)
# 121755      1

### cv.glmnet
set.seed(2108)
cl <- makePSOCKcluster(detectCores()-2,outfile="")
registerDoParallel(cl)
system.time(
  las.glm <- cv.glmnet(x=x, y=y,alpha=1,type.measure="mse",parallel = TRUE,
                      nfolds = 5, lambda = seq(0.001,0.1,by = 0.001),
                      standardize=F) 
)
stopCluster(cl)

# user  system elapsed 
# 17.98 2.28   37.23 


### caret
caretctrl <- trainControl(method = "cv", number = 5)
tune <- expand.grid(alpha=1,lambda = seq(0.001,0.1,by = 0.001))

set.seed(2108)
cl <- makePSOCKcluster(detectCores()-2,outfile="")
registerDoParallel(cl)
system.time(
  las.car <- train(x=x, y=as.numeric(y),alpha=1,method="glmnet",
                   metric="RMSE", allowParallel = TRUE,
                   trControl = caretctrl, tuneGrid = tune) 
)
stopCluster(cl)

# error
Something is wrong; all the RMSE metric values are missing:
  RMSE        Rsquared        MAE     
Min.   : NA   Min.   : NA   Min.   : NA  
1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
Median : NA   Median : NA   Median : NA  
Mean   :NaN   Mean   :NaN   Mean   :NaN  
3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
Max.   : NA   Max.   : NA   Max.   : NA  
NA's   :100   NA's   :100   NA's   :100  
Error: Stopping
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.
Timing stopped at: 3.97 1.37 127.9

我知道这可能是由于其中一个重采样中没有足够的数据,但是我怀疑这应该是我的数据大小和只有5倍的问题。我尝试了以下不适用于我的解决方案:

我认为caret正在执行glmnet未执行的其他一些重采样导致错误。有人可以阐明这个问题吗?

编辑1 x是一个210个指标和255个连续变量的半稀疏矩阵。

1 个答案:

答案 0 :(得分:1)

我认为,大多数问题都是通过一个示例在NULL中再次设置alpha = 1引起的。因此,即使您的x,y稀疏,它也可以工作:

train

因此cv.glmnet可以工作,现在,如果我们尝试您的代码,它将返回错误:

library(glmnet)
library(caret)
library(Matrix)

dat = Matrix(as.matrix(mtcars),sparse=TRUE)
x = as.matrix(mtcars[,-1])
y = as.matrix(mtcars[,1])

L = seq(0.001,0.1,by = 0.02)

las.glm <- cv.glmnet(x=x, y=y,alpha=1,type.measure="mse",nfolds = 5, lambda = L,standardize=FALSE)

删除alpha参数:

caretctrl <- trainControl(method = "cv", number = 5)
tune <- expand.grid(alpha=1,lambda = L)

las.car <- train(x=x, y=as.numeric(y),alpha=1,method="glmnet",
                   metric="RMSE",trControl = caretctrl, tuneGrid = tune) 

Something is wrong; all the RMSE metric values are missing:
      RMSE        Rsquared        MAE     
 Min.   : NA   Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  

它也可以用于密集矩阵。