错误 - lognet中的错误(x,is.sparse,ix,jx,y,权重,偏移量,alpha,nobs)=等

时间:2018-01-10 02:27:25

标签: r r-caret glmnet

在Caret中使用glmnet时出错

以下示例 加载库

library(dplyr)
library(caret)
library(C50)

从库C50加载流失数据集

data(churn)

创建x和y变量

churn_x <- subset(churnTest, select= -churn)   
churn_y <- churnTest[[20]]

使用createFolds()在churn_y(目标变量

)上创建5个CV折叠
 myFolds <- createFolds(churn_y, k = 5)

创建trainControl对象:myControl

myControl <- trainControl(
 summaryFunction = twoClassSummary,
 classProbs = TRUE, # IMPORTANT!
 verboseIter = TRUE,
 savePredictions = TRUE,
 index = myFolds
)

适合glmnet模型:model_glmnet

model_glmnet <- train(
  x = churn_x, y = churn_y,
  metric = "ROC",
  method = "glmnet",
  trControl = myControl
)

我收到以下错误

lognet中的错误(x,is.sparse,ix,jx,y,权重,偏移量,alpha,nobs,:   外国函数调用中的NA / NaN / Inf(arg 5) 另外:警告信息: 在lognet(x,is.sparse,ix,jx,y,weight,offset,alpha,nobs,:   强制引入的NA

我已经检查过,并且churn_x变量中没有缺失值

sum(is.na(churn_x))

有谁知道答案?

2 个答案:

答案 0 :(得分:1)

问题出在模型规范中。如果您使用插入符号列车公式界面,培训将起作用:

Nation

但是,当您指定train <- data.frame(churn_x, churn_y) model_glmnet <- train(churn_y ~ ., data = train, metric = "ROC", method = "glmnet", trControl = myControl ) > model_glmnet$results alpha lambda ROC Sens Spec ROCSD SensSD SpecSD 1 0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873 2 0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573 3 0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455 4 0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233 5 0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247 6 0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592 7 1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393 8 1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577 9 1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801 x时,它将无效,因为glmnet以模型矩阵的形式获取y,当您向插入符号提供公式时,它会照顾model.matrix创建,但如果你只是指定xx,那么它将假设y是一个model.matrix并将其传递给x。例如,这有效:

glmnet
仅当存在因子特征时才需要

x <- model.matrix(churn_y ~ ., data = train) model_glmnet2 <- train(x = x, y = churn_y, metric = "ROC", method = "glmnet", trControl = myControl ) > model_glmnet2$results alpha lambda ROC Sens Spec ROCSD SensSD SpecSD 1 0.10 0.0001754386 0.6958156 0.2845934 0.9123349 0.01855530 0.01616471 0.004002873 2 0.10 0.0017543858 0.7187303 0.2901986 0.9185721 0.01681286 0.01415863 0.005347573 3 0.10 0.0175438576 0.7399174 0.2355121 0.9487161 0.01482812 0.03932741 0.010769455 4 0.55 0.0001754386 0.6988285 0.2901800 0.9121614 0.01907845 0.01312159 0.004200233 5 0.55 0.0017543858 0.7260286 0.2946617 0.9185714 0.01761485 0.02171189 0.006755247 6 0.55 0.0175438576 0.7630039 0.2008939 0.9617103 0.01743847 0.03989938 0.006118592 7 1.00 0.0001754386 0.7009482 0.2924146 0.9119881 0.01958200 0.01233419 0.004157393 8 1.00 0.0017543858 0.7313495 0.2957728 0.9203040 0.01797853 0.02356945 0.008478577 9 1.00 0.0175438576 0.7672690 0.1595779 0.9760892 0.01935176 0.01935583 0.007938801

答案 1 :(得分:1)

如果您想使用glmnet并遇到相同的错误,请执行此操作!

简短答案:使用data.matrix()解决了我的问题!

最初,我正在做

# Given X and Y are datframes
cv.glmnet(x = as.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")

此问题已通过以下方式解决:

cv.glmnet(x = data.matrix(X), y = as.matrix(Y), alpha = 1, family = "binomial")

更长的答案(根本不长):

我遇到了同样的问题,我使用as.matrix()传递了X矩阵,如果您碰巧在数据帧中有因素{{, 1}}将一切变成一个字符。使用as.matrix()为我修复了该问题。 data.matrix()可以处理因素和有序因素,其中data.matrix()更基本。