如何使用LIBSVM从混淆矩阵中获得分类精度?

时间:2014-01-19 05:59:11

标签: libsvm

朋友们,目前我正在使用LIBSVM从事SVM分类器(5倍交叉验证)。下面我提到了代码。总的来说,数据有120 x 4个向量,有3个类。因此,每个折叠,trainData = 120 x 4,testData = 30 x 4.问题是,我必须从混淆矩阵中获得分类精度。 我需要以下问题的答案:

  1. 如何从混淆矩阵中获取每个类的分类精度??
  2. 概率估算需要什么?
  3. 该术语是指什么“预测概率最高的班级”?
  4. 我不明白“acc”???
  5. 的结果

    先谢谢你的朋友。

    代码是:

    load fisheriris                   %# Fisher Iris dataset
    [~,~,labels] = unique(species);   %# labels: 1/2/3
    data = zscore(meas);              %# scale features
    numInst = size(data,1);
    numLabels = max(labels);  
    FISH =[];
    
    numFolds = 5;
    for jj=1:5% number of iterations
    
    indices  = crossvalind('Kfold',labels,numFolds);       % K-Fold Validation
    for ii = 1:numFolds 
    test = (indices == ii); 
    train = ~test
    
    %# split training/testing
    idx = randperm(numInst);
    numTrain = 120; numTest = numInst - numTrain;
    trainData = data(idx(1:numTrain),:);  testData = data(idx(numTrain+1:end),:);
    trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));
    
    %# train one-against-all models
    model = cell(numLabels,1);
    for k=1:numLabels
    model{k} = svmtrain(double(trainLabel==k), trainData, '-t 2 -c 1 -g 1 -b 1');
    end
    
    %# get probability estimates of test instances using each model
    prob = zeros(numTest,numLabels);
    for k=1:numLabels
    [~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
    prob(:,k) = p(:,model{k}.Label==1);    %# probability of class==k
    end
    
    %# predict the class with the highest probability
    [~,pred] = max(prob,[],2);
    acc = sum(pred == testLabel) ./ numel(testLabel)    %# accuracy
    CM = confusionmat(testLabel, pred)                   %# confusion matrix
    end
    FISH =[FISH;(CM(1,1)/10)*100 (CM(2,2)/10)*100 (CM(3,3)/10)*100)      
    end 
    

1 个答案:

答案 0 :(得分:1)

1. How to get the classification accuracy for each class from confusion matrix??

您的代码通过将其转换为 one-vs-all 问题来处理3类分类问题。更具体地说,此代码double(trainLabel==k)在样本中分配标签1,该样本具有与k相同的标签并且在休息时标记为0。这是针对for循环内的所有类完成的,并为每种情况保存模型。实际上存在二进制分类问题,您可以使用sensitivityspecificity度量。通常,如果您有负面和可能的标签,则特异性测量分类器识别负标签的有效性,同时灵敏度衡量分类器识别正标签的有效性。一个很好的参考是here

2. What is the need for probability estimates?

LIBSVM具有参数选项(-b 1)以获得具有概率信息的模型并且使用概率估计来预测测试数据。在以下代码中

%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1);    %# probability of class==k
end

作为评论指的是我们得到每个对象在一个班级中的概率。这是在模型上循环并存储每个样本属于k变量指定的类的概率。这适用于所有标签numLabels

3. What the term refers "predict the class with the highest probability"? 矩阵prob包含的样本数和列数与标签数一样多。在每行中,列的数量包含样本到对应的标签数量的概率。例如,如果连续存在以下数字0.7 0.2 0.1,则相应的样本属于第1类。

4. I do not understand the result of "acc"??? acc是经典的准确度指标:正确分类的样本总数除以样本总数。这也可以通过对矩阵的对角元素求和并将该数除以矩阵中的帧总数从混淆矩阵中获得。