如何在Python中找到真正的肯定,真阴性,误报,漏报

时间:2015-03-30 15:54:12

标签: python classification metrics

我已经在Python中训练了一个分类器,我希望在进行新分类时找到真正的正面,真正的负面,误报和误报。 问题在于,每次,我的true_labels都包含一个值点,因为在我正在研究的问题中,我只有一个标签,我想看看分类器在识别这个标签时对新数据的执行情况有多好。 e.g:

labels_true = [2, 2, 2, 2, ..., 2]
labels_predicted = [2, 2, 23, 2, 2, 2, 2, 21, ..., 2, 2, 2, 2]

当然`len(labels_true)= len(labels_predicted)。由于我只有一个真正的标签,我如何计算上述指标?

1 个答案:

答案 0 :(得分:2)

如果您的label_true只包含true值,则只能找到真阳性(TP)和漏报(FN),因为没有可以找到的错误值(真阴性TN)或错过(误报FP)

TP,TN,FP,FN适用于二进制分类问题。要么分析整个混淆矩阵,要么进行分箱以获得二元问题

这是一个带分箱的解决方案:

from collections import Counter

truth      = [1, 2, 1, 2, 1, 1, 1, 2, 1, 3, 4, 1]
prediction = [1, 1, 2, 1, 1, 2, 1, 2, 1, 4, 4, 3]

confusion_matrix = Counter()

#say class 1, 3 are true; all other classes are false
positives = [1, 3]

binary_truth = [x in positives for x in truth]
binary_prediction = [x in positives for x in prediction]
print binary_truth
print binary_prediction

for t, p in zip(binary_truth, binary_prediction):
    confusion_matrix[t,p] += 1

print "TP: {} TN: {} FP: {} FN: {}".format(confusion_matrix[True,True], confusion_matrix[False,False], confusion_matrix[False,True], confusion_matrix[True,False])

编辑:这是一个完整的混乱矩阵

from collections import Counter

truth      = [1, 2, 1, 2, 1, 1, 1, 2, 1, 3, 4, 1]
prediction = [1, 1, 2, 1, 1, 2, 1, 2, 1, 4, 4, 3]

# make confusion matrix
confusion_matrix = Counter()
for t, p in zip(truth, prediction):
    confusion_matrix[t,p] += 1

# print confusion matrix
labels = set(truth + prediction)
print "t/p",
for p in sorted(labels):
    print p,
print
for t in sorted(labels):
    print t,
    for p in sorted(labels):
        print confusion_matrix[t,p],
    print