理解LIBSVM中ovrpredict的输出

时间:2017-04-06 08:52:28

标签: matlab libsvm

我正在实施Libsvm采用一对一战略的多类分类。为此,我使用了ovrtrainovrpredict MATLAB函数:

model = ovrtrain(GroupTrain, TrainingSet,'t -0' );
[predicted_labels ac decv] = ovrpredict(testY, TestSet, model);

ovrpredict的输出如下

Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 95% (19/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)
Accuracy = 90% (18/20) (classification)

我有10个课程,我是libsvm的新手,所以我猜这些准确性与每个课程的分类准确性相对应。但是,我不明白此输出与ac返回的准确度ovrpredict之间的差异,即60%。

ac =

    0.6000

由于

1 个答案:

答案 0 :(得分:1)

Both values are quite different from each other. Accuracy is the output of svmpredict() function, which tells you how your test data set is fitting to that specific class while ac gives you accuracy of input test class-labels (testY in your case) w.r.t predicted class-labels.

Lets, have a look inside overpredict function and see how these accuracy values are being generated.

function [pred, ac, decv] = ovrpredict(y, x, model)

From definition, we can see, we have 3 input parameters.

  1. y = Class labels
  2. x = Test sata set
  3. model = A struct containing 10 models for 10 different classes.

    labelSet = model.labelSet;

labelSet extracts labelSet (unique class-labels). In your case, you will have 10 unique labels, depending how you set while defining 10 separate classes of test data.

labelSetSize = length(labelSet)

Here you get number of classes (10 in your case).

 models = model.models;

'models' variable will contain all training models (10 in your case).

decv= zeros(size(y, 1), labelSetSize)

Here, decv matrix has been created to keep decision probablities of each test data value.

for i=1:labelSetSize
  [l,a,d] = svmpredict(double(y == labelSet(i)), x, models{i});
  decv(:, i) = d * (2 * models{i}.Label(1) - 1);
end

Here, we pass our test data from svmpredict function for each generated model. In your case, this loop will iterate 10 times and generate classification Accuracy of test for each specific class. For example, Accuracy = 90% (18/20) (classification) indicates that 18 out of 20 rows of your test data set matches to that specific class.

Please note, in multi-class SVM, you can't make a decision based on Accuracy values. You will need Pred and ac values to make individual or overall estimate respectively.

double(y == labelSet(i) changes multi-class labels to single class labels by by checking which labels in y belong to a specific class (where iterator i is pointing). it will output either 0 or 1 for unmatched or matched cases respectively. Hence output label vector will contain either 0's or 1's thus corresponding to single class SVM.

decv(:, i) = d * (2 * models{i}.Label(1) - 1) labels the decision values -ve(unhealthy) or +ve(healthy) depending upon the single-class label values in respective trained model. models{i}.Label(1) contains only 2 types of values .i.e. 0 (for unmatched cases) or 1(for matched cases). Hence (2 * models{i}.Label(1) - 1)will always evaluate to 1 or -1, therefore, labelling the decision value healthy or unhealthy.

[tmp,pred] = max(decv, [], 2);
pred = labelSet(pred);

max returns two column vectors, 1st (tmp) containing the maximum decision value in each row and end (pred) respective row (or class) index.Hence, we are only interested in class index, we discard tmp variable.

ac = sum(y==pred) / size(x, 1);

Finally, we will calculate ac by checking how many predicted labels match input test labels and dividing the sum with number of test classes.

In your case ac=0.6 means 6 out of 10 test labels match predicted labels or 4 labels have been predicted otherwise.

I hope, it answers your question.