Question

我正在使用 SciKit 作为库来处理分类算法，例如：NB，SVM。

这是一个非常好的binary classification implementation，用于＆＃34; 垃圾邮件和 HAM ＆＃34; 电子邮件：

    confusion += confusion_matrix(test_y, predictions)
    score = f1_score(test_y, predictions, pos_label=SPAM)
   //note in my case 3-classes I do not need to set [pos_label]

如果我有三个类，如{SPAM，HAM，NORMAL}而不是两个，则：我如何调整该代码，以便为每个类以及所有类找到 F1-Score < EM>平均。

Answer 1

这里的问题是 F1测量对多类问题没有多大意义。它是精确度和召回之间的调和平均值。

精确度是（随机选择的）正分类实例为正的概率。

召回是（随机选择的）阳性实例被归类为阳性的概率。

这些定义本质上是二进制的。通常我会分别为每个班级提供F1测量。这使您还可以决定哪种类型的故障是可以接受的。根据我的个人经验，我实际上会给予精确和回忆。在您的示例中，将火腿电子邮件分类为垃圾邮件会非常有害。因此，SpAM上的Precision比召回更重要。

如需更广泛的概述，还包含一系列措施，您还可以查看http://rali.iro.umontreal.ca/rali/sites/default/files/publis/SokolovaLapalme-JIPM09.pdf

Answer 2

在sklearn中使用分类报告来计算多个班级的F分数。

from sklearn.metrics import classification_report as cr
gold = []
pred = []
# given a test set with annotated gold labels
for testinstance, goldlabel in testdata:
    gold.append(goldlabel)
    #clf is your classifier object with predict method
    predictedlabel = clf.predict(testinstance)
    pred.append(predictedlabel)
print cr(gold,pred, digits=4)

如何在多类分类中为每个类计算F1度量？

2 个答案: