使用Matlab时,找到交叉验证拟合中误差最小的模型的正确方法是什么?我的目标是显示最佳交叉验证决策树的错误率,作为测试数据大小的函数,并具有以下代码:
chess = csvread(filename);
predictors = chess(:,1:6);
class = chess(:,7);
cvpart = cvpartition(class,'holdout', 0.3);
Xtrain = predictors(training(cvpart),:);
Ytrain = class(training(cvpart),:);
Xtest = predictors(test(cvpart),:);
Ytest = class(test(cvpart),:);
numElements = numel(training(cvpart));
trainErrorGrowing = zeros(numElements,1);
testErrorGrowing = zeros(numElements,1);
for n = 100:numElements
data = datasample(training(cvpart), n);
dataX = predictors(data,:);
dataY = class(data,:);
% Fit the decision tree
tree = fitctree(dataX, dataY, 'AlgorithmForCategorical', 'PullLeft', 'CrossVal', 'on');
% Loop to find the model with the least error
kfoldError = 100;
bestTree = tree.Trained{1};
for i = 1:10
err = loss(tree.Trained{i}, Xtrain, Ytrain);
if err < kfoldError
kfoldError = err;
bestTree = tree.Trained{i};
end
end
trainErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Training Error
testErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Testing Error
end
plot(numElements,testErrorGrowing);
用于最终测试的数据不以任何方式用于训练树的指标非常重要。但是,当我尝试执行此代码时,我收到错误
Error using classreg.learning.internal.classCount
You passed an unknown class '1' of type double.
就行了
err = loss(tree.Trained{i}, Xtrain, Ytrain);
我已尝试在int8和char中转换迭代器,但两次都会收到相同的错误。有没有更简单的方法来查找具有最小错误的结果决策树,或者至少是一种引用各个受过训练的树的方法?
答案 0 :(得分:0)
假设您在学习模型时正在进行10倍交叉验证。然后,您可以使用kfoldLoss函数同时获得每次折叠的CV损失,然后选择训练有素的模型,以下列方式减少CV损失:
modelLosses = kfoldLoss(tree,'mode','individual');
如果您在学习期间进行了10次交叉验证,上面的代码将为您提供长度为10(10个CV错误值)的向量。假设经过训练的模型具有最小的CV误差是第k个,那么您将使用:
testSetPredictions = predict(tree.Trained{k}, testSetFeatures);