在Matlab中从交叉验证中返回最佳决策树

时间:2016-01-28 04:02:30

标签: matlab validation tree classification

使用Matlab时,找到交叉验证拟合中误差最小的模型的正确方法是什么?我的目标是显示最佳交叉验证决策树的错误率,作为测试数据大小的函数,并具有以下代码:

chess = csvread(filename);
predictors = chess(:,1:6);
class = chess(:,7);

cvpart = cvpartition(class,'holdout', 0.3);
Xtrain = predictors(training(cvpart),:);
Ytrain = class(training(cvpart),:);
Xtest = predictors(test(cvpart),:);
Ytest = class(test(cvpart),:);

numElements = numel(training(cvpart));
trainErrorGrowing = zeros(numElements,1);
testErrorGrowing = zeros(numElements,1);

for n = 100:numElements
    data = datasample(training(cvpart), n);
    dataX = predictors(data,:);
    dataY = class(data,:);

    % Fit the decision tree
    tree = fitctree(dataX, dataY, 'AlgorithmForCategorical', 'PullLeft', 'CrossVal', 'on');

    % Loop to find the model with the least error
    kfoldError = 100;
    bestTree = tree.Trained{1};
    for i = 1:10
        err = loss(tree.Trained{i}, Xtrain, Ytrain);
        if err < kfoldError
            kfoldError = err;
            bestTree = tree.Trained{i};
        end
    end
    trainErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Training Error
    testErrorGrowing(n) = loss(bestTree,Xtest,Ytest,'Subtrees','all'); % Testing Error
end

plot(numElements,testErrorGrowing);

用于最终测试的数据不以任何方式用于训练树的指标非常重要。但是,当我尝试执行此代码时,我收到错误

Error using classreg.learning.internal.classCount
You passed an unknown class '1' of type double.

就行了

err = loss(tree.Trained{i}, Xtrain, Ytrain);

我已尝试在int8和char中转换迭代器,但两次都会收到相同的错误。有没有更简单的方法来查找具有最小错误的结果决策树,或者至少是一种引用各个受过训练的树的方法?

1 个答案:

答案 0 :(得分:0)

假设您在学习模型时正在进行10倍交叉验证。然后,您可以使用kfoldLoss函数同时获得每次折叠的CV损失,然后选择训练有素的模型,以下列方式减少CV损失:

modelLosses = kfoldLoss(tree,'mode','individual');

如果您在学习期间进行了10次交叉验证,上面的代码将为您提供长度为10(10个CV错误值)的向量。假设经过训练的模型具有最小的CV误差是第k个,那么您将使用:

testSetPredictions = predict(tree.Trained{k}, testSetFeatures);