Question

我想使用逻辑回归（我的输出是分类的）来执行套索回归，以从我的数据集中选择重要变量＆＃34;数据＆＃34;然后选择这些重要变量＆＃34;变量＆＃34;并在validationset x.test上测试它们并比较实际值的预测值，但是我得到了这个错误： cbind2（1，newx）％*％nbeta出错： Erreur Cholmod＆＃39; X和/或Y的尺寸错误＆＃39; dans le fichier ../MatrixOps/cholmod_sdmult.c,ligne 90

 library(glmnet)
library(caret)
# class label must be factor 0 noevent, 1:anomalous
iris$Species<-ifelse(iris$Species=="setosa",0,1)
#data$Cardio1M=factor(data$Cardio1M)
#split data into train and test
trainIndex <- createDataPartition(iris$Species, p=0.7, list=FALSE)
data_train <- iris[ trainIndex,]
data_test <- iris[-trainIndex,]
x.train <- data.matrix (data_train [ ,1:ncol(data_train)-1])
y.train <- data.matrix (data_train$Species)
x.test <- data.matrix (data_test [,1:(ncol(data_test))-1])
y.test <- data.matrix(data_test$Species)
#fitting generalized linear modelalpha=0 then ridge regression is used, while if alpha=1 then the lasso
# of ?? values (the shrinkage coefficient)
#Associated with each value of ?? is a vector of regression coefficients. For example, the 100th value of ??, a very small one, is closer to perform least squares:
Lasso.mod <- glmnet(x.train, y.train, alpha=1, nlambda=100, lambda.min.ratio=0.0001,family="binomial")
#use 10 fold cross-validation to choose optimal ??.
set.seed(1)
#cv.out <- cv.glmnet(x, y, alpha=1,family="binomial", nlambda=100, lambda.min.ratio=0.0001,type.measure = "class")
cv.out <- cv.glmnet(x.train, y.train, alpha=1,family="binomial", nlambda=100, type.measure = "class")
#Ploting the misclassification error and the diferent values of lambda
plot(cv.out)
best.lambda <- cv.out$lambda.min
best.lambda
co<-coef(cv.out, s = "lambda.min")
#Once we have the best lambda, we can use predict to obtain the coefficients.
p<-predict(Lasso.mod, s=best.lambda, type="coefficients")[1:6, ]
p

我想测试所选功能是否有助于减少测试集上的错误，但即使使用虹膜数据集也出现错误

#Selection of the significant features(predictors)
inds<-which(co!=0)
variables<-row.names(co)[inds]
variables<-variables[!(variables %in% '(Intercept)')];
#predict output values based on selected predictors
p <- predict(cv.out, s=best.lambda, newx=x.test,type="class")
# Calculate accuracy
Accuracy<- mean(p==y.test)

Answer 1

我试着留下一条评论说明出了什么问题，但是时间太长了，所以我必须发一个答案。此外，我知道以下是您收到错误的原因，但没有可重现的示例，我不能保证也没有其他问题。

主要问题是您使用的是x.test[, variables]而不是x.test。对象cv.out包含所有变量名称，包括减少为0的变量名称，因此predict命令不知道在哪里找到这些变量名称，因为您将x.test分组为仅包括具有显着系数的变量。

即使是这种情况，它仍然无效。原因是您使用s = "lambda.min"获得了重要的系数，但之后您尝试使用s=cv.out$lambda.1se进行预测。问题是，如果有一些变量，例如X2在lambda.min模型中归零，在lambda.1se模型中可能仍然很重要。因此，当predict命令尝试在x.test中找到它时，它不能，因为它不在variables中。

所以最后，你应该做的是：

p <- predict(Lasso.mod, s=best.lambda, newx=x.test, type="class")

您的代码也存在其他问题，但我不相信它们会导致错误消息。我希望这有帮助！

重大更新

你还应该解决的问题是：

当您创建x.test和x.train时，请将length更改为ncol。实际上，在这两种情况下，您都需要data_test [,1:(ncol(data_test))-1]。即使length和ncol在这种情况下会给出相同的数字，但如果它是矩阵而不是data.frame则不会。此外，您还需要-1部分，因为否则您将y包含在x中。
在您创建type="response"时将type=class"更改为p，否则您的Accuracy将为0.（我在上面的代码中对其进行了更改）

r中的套索特征选择

1 个答案:

重大更新