我对机器学习非常陌生,但由于我的课程,我已经跟踪了这些材料,并且能够在我的数据中使用随机森林,并获得有意义的错误率(打败一个愚蠢的预测并且变得更好更好的选择功能)。
我的预测矩阵(zscored,这是一个子集)是:
-0.0767889379600161 1.43666113298993 4.83220576535887 4.59650550158967
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.187208672625236 -0.00955946380486005
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
-0.0767889379600161 -0.114493297876403 -0.217229093905045 -0.187718580390875
7.39424877391969 1.12643024681666 -0.145180082833503 -0.187718580390875
-0.0767889379600161 2.05712290533646 -0.211225009649084 -0.187718580390875
-0.0767889379600161 0.195737588296863 1.35584098115696 0.229434473078818
我的回答是:
'Highly Active'
'Inactive'
'Inactive'
'Inactive'
'Inactive'
'Highly Active'
'Highly Active'
'Highly Active'
'Inactive'
'Highly Active'
'Inactive'
'Highly Active'
我以前的方法是:
rng default
c = cvpartition(catresponse, 'HoldOut', 0.3);
% Extract the indices of the training and test sets.
trainIdx = training(c);
testIdx = test(c);
% Create the training and test data sets.
XTrain = predictormatrix(trainIdx, :);
XTest = predictormatrix(testIdx, :);
yTrain = catresponse(trainIdx);
yTest = catresponse(testIdx);
% Create an ensemble of 100 trees.
forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
'Tree', 'Type', 'Classification');
% Predict and evaluate the ensemble model.
forestPred = predict(forestModel, XTest);
% errs = forestPred ~= yTest;
% testErrRateForest = 100*sum(errs)/numel(errs);
% display(testErrRateForest)
% Perform 10-fold cross validation.
cvModel = crossval(forestModel); % 10-fold is default
cvErrorForest = 100*kfoldLoss(cvModel);
display(cvErrorForest)
% Confusion matrix.
C = confusionmat(yTest, forestPred);
figure(figOpts{:})
imagesc(C)
colorbar
colormap('cool')
[Xgrid, Ygrid] = meshgrid(1:size(C, 1));
Ctext = num2str(C(:));
text(Xgrid(:), Ygrid(:), Ctext)
labels = categories(catresponse);
set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
'XTickLabelRotation', 30, ...
'TickLabelInterpreter', 'none')
xlabel('Predicted Class')
ylabel('Known Class')
title('Forest Confusion Matrix')
问题:
cvpartition KFold
那样,所以我担心cvLoss
是什么实际上在这里计算。