混淆矩阵错误,测试数据少于训练数据

时间:2020-10-23 08:27:04

标签: matlab machine-learning training-data confusion-matrix

我的模型精度计算存在问题。我使用下面的代码:

y_train = [ 1  1  1  4  4  3  3  5 5 5 ]; % true labels for x_train
%x_test : has no true labels. 
predictedLabel=[ 1 2 3 4 5 ]; % predicted labels for x_test

group=y_train ; % 10
grouphat=predictedLabel; % for test 5 test data
C=confusionmat(group,grouphat);
Accuracy = sum ( diag (C)) / sum (C (:)) ×100;

但是我得到了错误:

使用混淆垫时出错(第75行)
G和GHAT必须具有相同的行数

由于测试数据多于或少于火车,我会收到此错误吗?测试数据没有真正的标签(半监督学习)。

1 个答案:

答案 0 :(得分:1)

您的训练标签和预测标签基于不同的输入,因此在混淆矩阵中比较它们是没有意义的。来自confusionmat docs

返回由已知和预测组

确定的混淆矩阵C

即同一数据的已知和预测结果

以这个部分为伪代码的示例为例,有关详细信息,请参见注释

% split your input data
trainData = data(1:100, :);  % Training data
testData = data(101:120, :); % Testing data (mutually exclusive from training)
% Do some training (pseudo-code, not valid MATLAB)
% ** Let's assume that the labels are in column 1 **
model = train( trainData(:,1), trainData(:,2:end) );
% Test your model on the input data, excluding the actual labels in column 1
predictedLabels =  model( testData(:,2:end) );
% Get the actual labels from column 1
actualLabels = testData(:,1);
% Note that size(predictedLabels) == size(actualLabels)
% Now we can do a confusion matrix
C = confusionmat( actualLabels, predictedLabels )