R:使用Caret进行Logistic回归的交叉验证特征选择

时间:2017-02-18 11:58:33

标签: r r-caret cross-validation feature-extraction

我目前正在学习如何在R

中实施逻辑回归

我已经获取了一个数据集并将其拆分为训练和测试集,并希望使用交叉验证来实现forward selectionbackward selectionbest subset selection以选择最佳功能。 我正在使用caret在训练数据集上实施cross-validation,然后测试对测试数据的预测。

我在插入符号中看到rfe控件,并且还查看了caret website上的文档以及问题How to use wrapper feature selection with algorithms in R?上的链接。我不清楚如何更改特征选择的类型,因为它似乎默认为向后选择。谁能帮助我完成我的工作流程。以下是可重复的示例

library("caret")

# Create an Example Dataset from German Credit Card Dataset
mydf <- GermanCredit

# Create Train and Test Sets 80/20 split
trainIndex <- createDataPartition(mydf$Class, p = .8, 
                              list = FALSE, 
                              times = 1)

train <- mydf[ trainIndex,]
test  <- mydf[-trainIndex,]


ctrl <- trainControl(method = "repeatedcv", 
                 number = 10, 
                 savePredictions = TRUE)

mod_fit <- train(Class~., data=train, 
             method="glm", 
             family="binomial",
             trControl = ctrl, 
             tuneLength = 5)


# Check out Variable Importance
varImp(mod_fit)
summary(mod_fit)

# Test the new model on new and unseen Data for reproducibility
pred = predict(mod_fit, newdata=test)
accuracy <- table(pred, test$Class)
sum(diag(accuracy))/sum(accuracy)

0 个答案:

没有答案