朋友们,目前我正在使用LIBSVM从事SVM分类器(5倍交叉验证)。下面我提到了代码。总的来说,数据有120 x 4个向量,有3个类。因此,每个折叠,trainData = 120 x 4,testData = 30 x 4.问题是,我必须从混淆矩阵中获得分类精度。 我需要以下问题的答案:
先谢谢你的朋友。
代码是:
load fisheriris %# Fisher Iris dataset
[~,~,labels] = unique(species); %# labels: 1/2/3
data = zscore(meas); %# scale features
numInst = size(data,1);
numLabels = max(labels);
FISH =[];
numFolds = 5;
for jj=1:5% number of iterations
indices = crossvalind('Kfold',labels,numFolds); % K-Fold Validation
for ii = 1:numFolds
test = (indices == ii);
train = ~test
%# split training/testing
idx = randperm(numInst);
numTrain = 120; numTest = numInst - numTrain;
trainData = data(idx(1:numTrain),:); testData = data(idx(numTrain+1:end),:);
trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(trainLabel==k), trainData, '-t 2 -c 1 -g 1 -b 1');
end
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == testLabel) ./ numel(testLabel) %# accuracy
CM = confusionmat(testLabel, pred) %# confusion matrix
end
FISH =[FISH;(CM(1,1)/10)*100 (CM(2,2)/10)*100 (CM(3,3)/10)*100)
end
答案 0 :(得分:1)
1. How to get the classification accuracy for each class from confusion matrix??
您的代码通过将其转换为 one-vs-all 问题来处理3类分类问题。更具体地说,此代码double(trainLabel==k)
在样本中分配标签1,该样本具有与k相同的标签并且在休息时标记为0。这是针对for循环内的所有类完成的,并为每种情况保存模型。实际上存在二进制分类问题,您可以使用sensitivity和specificity度量。通常,如果您有负面和可能的标签,则特异性测量分类器识别负标签的有效性,同时灵敏度衡量分类器识别正标签的有效性。一个很好的参考是here。
2. What is the need for probability estimates?
LIBSVM具有参数选项(-b 1)以获得具有概率信息的模型并且使用概率估计来预测测试数据。在以下代码中
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
作为评论指的是我们得到每个对象在一个班级中的概率。这是在模型上循环并存储每个样本属于k
变量指定的类的概率。这适用于所有标签numLabels
。
3. What the term refers "predict the class with the highest probability"?
矩阵prob
包含的样本数和列数与标签数一样多。在每行中,列的数量包含样本到对应的标签数量的概率。例如,如果连续存在以下数字0.7 0.2 0.1
,则相应的样本属于第1类。
4. I do not understand the result of "acc"???
acc
是经典的准确度指标:正确分类的样本总数除以样本总数。这也可以通过对矩阵的对角元素求和并将该数除以矩阵中的帧总数从混淆矩阵中获得。