如何在Mahout中计算可靠性

时间:2014-05-17 14:54:53

标签: classification mahout

我从Mahout的分类中得到以下输出:

======================================================= 
Summary 
------------------------------------------------------- 
Correctly Classified Instances          :       3948       93,4217%
Incorrectly Classified Instances        :        278        6,5783% 
Total Classified Instances              :       4226 

======================================================= 
Confusion Matrix 
------------------------------------------------------- 
a       b       <--Classified as 
3747    263      |  4010        a     = NOT_Science fiction 
15      201      |  216         b     = Science fiction 

======================================================= 
 Statistics 
------------------------------------------------------- 
Kappa                                       0,5594 
Accuracy                                   93,4217% 
Reliability                                62,1657% 
Reliability (standard deviation)            0,5384

Mahout如何计算可靠性?

根据https://issues.apache.org/jira/browse/MAHOUT-941,它应该是用户准确性。据我了解用户准确性,它应该为每列计算正确分类的实例除以按此类别分类的总数。 (http://spatial-analyst.net/ILWIS/htm/ilwismen/confusion_matrix.htm

到目前为止,我无法弄清楚如何计算62,1657%。

如果我计算课程的平均值,我会得到以下内容: ((3747/4010)+(201/216))/ 2 = 0.932 - &gt; 93.2%

如果我计算列的平均值,我会得到以下结果: ((3747/3762)+(201/464))/ 2 = 0.715 - &gt; 71.5%

1 个答案:

答案 0 :(得分:0)

可靠性是用户准确性。在当前版本(0.9)中未正确计算。

public RunningAverageAndStdDev getNormalizedStats() {
    RunningAverageAndStdDev summer = new FullRunningAverageAndStdDev();
    for(int d = 0; d < confusionMatrix.length; d++) {
        double total = 0;
        for(int j = 0; j < confusionMatrix.length; j++) {
            total += confusionMatrix[d][j];
        }
        summer.addDatum(confusionMatrix[d][d] / (total + 0.000001));
    }
    return summer;
}

问题是Confusion Matrix包含所有标签和一个额外的“DEFAULT”标签。似乎“DEFAULT”标签适用于未分类的实例。如果您没有未分类的实例,则会干扰结果。另外检查“DEFAULT”标签,它对我有用。

public RunningAverageAndStdDev getNormalizedStats() {
RunningAverageAndStdDev summer = new FullRunningAverageAndStdDev();
    for (int d = 0; d < confusionMatrix.length; d++) {

        //Do not add the "DEFAULT" label to the calculation
        if(labelMap.get(defaultLabel) == d)
                continue;

        double total = 0;
        for (int j = 0; j < confusionMatrix.length; j++) {
            total += confusionMatrix[d][j];
        }
        summer.addDatum(confusionMatrix[d][d] / (total + 0.000001));
    }
    return summer;
}