交叉验证和套索正则化的逻辑回归错误

时间:2020-06-08 17:38:54

标签: r logistic-regression cross-validation glmnet lasso-regression

我想用套索正则化创建5倍CV Logistic回归模型,但出现以下错误消息:Something is wrong; all the RMSE metric values are missing:

我通过设置alpha=1开始使用套索正则化进行逻辑回归。这有效。我从this example开始扩展。

# Load data set
data("mtcars")

# Prepare data set 
x   <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y   <- factor(mpg, labels = c('notEfficient', 'efficient'))

#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)

#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                         lambda = mod_cv$lambda.min)

我了解到glmnet函数已经完成了10倍简历。但我想使用5折简历。因此,当我在n_folds中使用cv.glmnet进行修改时,找不到最小系数,也无法在修改trControl时制作模型。

#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, n_folds=5)


#Error in glmnet(x, y, weights = weights, offset = offset, #lambda = lambda,  : 
#  unused argument (n_folds = 5)

#logistic regression with 5-fold cv
    # define training control
    train_control <- trainControl(method = "cv", number = 5)

# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial", alpha=1)

#Something is wrong; all the Accuracy metric values are missing:
#    Accuracy       Kappa    
#Min.   : NA   Min.   : NA  
# 1st Qu.: NA   1st Qu.: NA  
# Median : NA   Median : NA  
# Mean   :NaN   Mean   :NaN  
# 3rd Qu.: NA   3rd Qu.: NA  
# Max.   : NA   Max.   : NA  
 # NA's   :1     NA's   :1  

为什么我添加5倍简历时会出现错误?

1 个答案:

答案 0 :(得分:2)

您的代码中有2个问题: 1)n_folds中的cv.glmnet参数实际上称为nfolds,并且2)train函数不接受任何alpha参数。如果您解决了这些问题,您的代码就会起作用:

# Load data set
data("mtcars")
library(glmnet)
library(caret)

# Prepare data set 
x   <- model.matrix(~.-1, data= mtcars[,-1])
mpg <- ifelse( mtcars$mpg < mean(mtcars$mpg), 0, 1)
y   <- factor(mpg, labels = c('notEfficient', 'efficient'))

#find minimum coefficient
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1)

#logistic regression with lasso regularization
logistic_model <- glmnet(x, y, alpha=1, family = "binomial",
                         lambda = mod_cv$lambda.min)



#find minimum coefficient by adding 5-fold cv
mod_cv <- cv.glmnet(x=x, y=y, family='binomial', alpha=1, nfolds=5)


#logistic regression with 5-fold cv
# define training control
train_control <- trainControl(method = "cv", number = 5)

# train the model with 5-fold cv
model <- train(x, y, trControl = train_control, method = "glm", family="binomial")
model$results
#>  parameter  Accuracy     Kappa AccuracySD   KappaSD
#>1      none 0.8742857 0.7362213 0.07450517 0.1644257