我是插入式遗传算法特征选择的新手,从对虹膜数据集的简单运行开始。我想提取最佳功能,其准确性以及模型训练的总数(功能子集的评估)。此外,我不了解最终模型的构建方式。
Caret提供了方法的说明:https://topepo.github.io/caret/feature-selection-using-genetic-algorithms.html 但是,我并没有真正理解他们最终将需要多少模型训练以及如何构建最终模型。
library(caret)
dataset <- iris
levels(dataset$Species) <- c(0, 0, 1)
ga_ctrl <- gafsControl(functions = caretGA, method = "cv", number = 3, verbose = TRUE)
res <- caret::gafs(x = dataset[, 1:(length(dataset)-1)],
y = dataset[, length(dataset)],
iters = 5,
popSize = 6,
pcrossover = 0.8,
pmutation = 0.1,
gafsControl = ga_ctrl,
method = "glm", family = binomial(link = 'logit'),
trControl = trainControl(method = "cv"))
我得到了这个输出(现在不设置特定的种子):
Fold1 1 0.96 (2)
Fold1 2 0.96->0.97 (2->4, 50.0%) *
Fold1 3 0.97->0.9788889 (4->2, 50.0%) *
Fold1 4 0.9788889->0.9809091 (2->2, 100.0%) *
Fold1 5 0.9809091->0.98 (2->2, 100.0%)
Fold2 1 0.9718182 (3)
Fold2 2 0.9718182->0.9709091 (3->4, 75.0%)
Fold2 3 0.9718182->0.9718182 (3->4, 75.0%)
Fold2 4 0.9718182->0.9709091 (3->1, 33.3%)
Fold2 5 0.9718182->0.9688889 (3->3, 50.0%)
Fold3 1 0.97 (3)
Fold3 2 0.97->0.9688889 (3->3, 100.0%)
Fold3 3 0.97->0.97 (3->3, 100.0%)
Fold3 4 0.97->0.969798 (3->2, 66.7%)
Fold3 5 0.97->0.9709091 (3->3, 100.0%) *
+ final GA
1 0.9533333 (1)
2 0.9533333->0.96 (1->2, 50.0%) *
3 0.96->0.9533333 (2->2, 100.0%)
4 0.96->0.98 (2->3, 66.7%) *
5 0.98->0.9733333 (3->3, 100.0%)
+ final model
res$ga$fit
的精度为0.96,而res$fit
的精度为0.9533333,但是0.98应该是正确的值,不是吗?
对于培训的总数,我想这就像
popSize * iters * folds
折叠是ga_ctrl中的数字,但我不确定。
要获得最佳功能,res$optVariables
是否正确?
答案 0 :(得分:0)