Question

我对机器学习非常陌生，但由于我的课程，我已经跟踪了这些材料，并且能够在我的数据中使用随机森林，并获得有意义的错误率（打败一个愚蠢的预测并且变得更好更好的选择功能）。

我的预测矩阵（zscored，这是一个子集）是：

-0.0767889379600161 1.43666113298993    4.83220576535887    4.59650550158967
-0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
-0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
-0.0767889379600161 -0.114493297876403  -0.187208672625236  -0.00955946380486005
-0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
-0.0767889379600161 -0.114493297876403  -0.217229093905045  -0.187718580390875
7.39424877391969    1.12643024681666    -0.145180082833503  -0.187718580390875
-0.0767889379600161 2.05712290533646    -0.211225009649084  -0.187718580390875
-0.0767889379600161 0.195737588296863   1.35584098115696    0.229434473078818

我的回答是：

'Highly Active'
'Inactive'
'Inactive'
'Inactive'
'Inactive'
'Highly Active'
'Highly Active'
'Highly Active'
'Inactive'
'Highly Active'
'Inactive'
'Highly Active'

我以前的方法是：

rng default
c = cvpartition(catresponse, 'HoldOut', 0.3);

% Extract the indices of the training and test sets.
trainIdx = training(c);
testIdx = test(c);
% Create the training and test data sets.
XTrain = predictormatrix(trainIdx, :);
XTest = predictormatrix(testIdx, :);
yTrain = catresponse(trainIdx);
yTest = catresponse(testIdx);

% Create an ensemble of 100 trees.
forestModel = fitensemble(XTrain, yTrain, 'Bag', 100,...
                            'Tree', 'Type', 'Classification'); 

% Predict and evaluate the ensemble model.
forestPred = predict(forestModel, XTest);
% errs = forestPred ~= yTest;
% testErrRateForest = 100*sum(errs)/numel(errs);
% display(testErrRateForest)

% Perform 10-fold cross validation.
cvModel = crossval(forestModel); % 10-fold is default 
cvErrorForest = 100*kfoldLoss(cvModel);
display(cvErrorForest)

% Confusion matrix.
C = confusionmat(yTest, forestPred);
figure(figOpts{:})
imagesc(C)
colorbar
colormap('cool')
[Xgrid, Ygrid] = meshgrid(1:size(C, 1));
Ctext = num2str(C(:));
text(Xgrid(:), Ygrid(:), Ctext)
labels = categories(catresponse);
set(gca, 'XTick', 1:size(C, 1), 'XTickLabel', labels, ...
         'YTick', 1:size(C, 1), 'YTickLabel', labels, ...
         'XTickLabelRotation', 30, ...
         'TickLabelInterpreter', 'none')
xlabel('Predicted Class')
ylabel('Known Class')
title('Forest Confusion Matrix')

问题：

我是否以正确的方式进行交叉验证 - 我的cvLoss代码基于使用30％保持力构建的模型，而不是像cvpartition KFold那样，所以我担心cvLoss是什么实际上在这里计算。
我的交叉验证混淆矩阵是基于交叉验证还是带有上述代码的更简单的保留版本？
如何更改我的代码，以便整个模型“＃34;交叉验证＆＃34;？

无法使现有的测试/训练模型适应10倍交叉验证

0 个答案: