Question

我正在尝试找到下面给出的混淆矩阵的精度和召回率，但出现错误。我将如何使用 numpy 和 sklearn 来完成它？

array([[748,   0,   4,   5,   1,  16,   9,   4,   8,   0],
       [  0, 869,   6,   5,   2,   2,   2,   5,  12,   3],
       [  6,  19, 642,  33,  13,   7,  16,  15,  31,   6],
       [  5,   3,  30, 679,   2,  44,   1,  12,  23,  12],
       [  4,   7,   9,   2, 704,   5,  10,   8,   7,  43],
       [  5,   6,  10,  39,  11, 566,  18,   4,  33,  10],
       [  6,   5,  17,   2,   5,  12, 737,   2,   9,   3],
       [  5,   7,   8,  18,  14,   2,   0, 752,   5,  42],
       [  7,  15,  34,  28,  12,  29,   6,   4, 600,  18],
       [  4,   6,   6,  16,  21,   4,   0,  50,   8, 680]], dtype=int64)

Answer 1

您可以使用 scikit-learn 计算每个类的 recall 和 precision。

示例：

from sklearn.metrics import classification_report

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))

              precision    recall  f1-score   support

     class 0       0.50      1.00      0.67         1
     class 1       0.00      0.00      0.00         1
     class 2       1.00      0.67      0.80         3

    accuracy                           0.60         5
   macro avg       0.50      0.56      0.49         5
weighted avg       0.70      0.60      0.61         5

参考here

Answer 2

正如其他人已经推荐的那样，您可以直接计算 y_actual 和 y_predictedusingsklearn.metricswithprecision_scoreandrecall_score` 来计算您需要的值。在此处阅读有关 precision 和 recall 分数的更多信息。

但是，IIUC，您希望直接使用混淆矩阵做同样的事情。以下是直接使用混淆矩阵计算准确率和召回率的方法。

首先，我将使用一个虚拟示例进行演示，显示来自 SKLEARN API 的结果，然后直接计算它们。

<块引用>

注意：通常计算的精度和召回率有 2 种类型 -

微精度：所有类别的所有 TP 相加并除以 TP+FP
宏观精度：对每个类分别计算TP/TP+FP，然后取平均值（忽略nans）
您可以在 types of precision (and recall) here 上找到更多详细信息。

我在下面展示了两种方法供您理解 -

import numpy as np
from sklearn.metrics import confusion_matrix, precision_score, recall_score

####################################################
#####Using SKLEARN API on TRUE & PRED Labels########
####################################################

y_true = [0, 1, 2, 2, 1, 1]
y_pred = [0, 2, 2, 2, 1, 2]
confusion_matrix(y_true, y_pred)

precision_micro = precision_score(y_true, y_pred, average="micro")
precision_macro = precision_score(y_true, y_pred, average="macro")
recall_micro = recall_score(y_true, y_pred, average='micro')
recall_macro = recall_score(y_true, y_pred, average="macro")

print("Sklearn API")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

####################################################
####Calculating directly from confusion matrix######
####################################################

cf = confusion_matrix(y_true, y_pred)
TP = cf.diagonal()

precision_micro = TP.sum()/cf.sum()
recall_micro = TP.sum()/cf.sum()

#NOTE: The sum of row-wise sums of a matrix = sum of column-wise sums of a matrix = sum of all elements of a matrix
#Therefore, the micro-precision and micro-recall is mathematically the same for a multi-class problem.


precision_macro = np.nanmean(TP/cf.sum(0))
recall_macro = np.nanmean(TP/cf.sum(1))

print("")
print("Calculated:")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

Sklearn API
precision_micro: 0.6666666666666666
precision_macro: 0.8333333333333334
recall_micro: 0.6666666666666666
recall_macro: 0.7777777777777777

Calculated:
precision_micro: 0.6666666666666666
precision_macro: 0.8333333333333334
recall_micro: 0.6666666666666666
recall_macro: 0.7777777777777777

既然我已经证明 API 背后的定义如所描述的那样工作，让我们为您的案例计算准确率和召回率。

cf = [[748,   0,   4,   5,   1,  16,   9,   4,   8,   0],
      [  0, 869,   6,   5,   2,   2,   2,   5,  12,   3],
      [  6,  19, 642,  33,  13,   7,  16,  15,  31,   6],
      [  5,   3,  30, 679,   2,  44,   1,  12,  23,  12],
      [  4,   7,   9,   2, 704,   5,  10,   8,   7,  43],
      [  5,   6,  10,  39,  11, 566,  18,   4,  33,  10],
      [  6,   5,  17,   2,   5,  12, 737,   2,   9,   3],
      [  5,   7,   8,  18,  14,   2,   0, 752,   5,  42],
      [  7,  15,  34,  28,  12,  29,   6,   4, 600,  18],
      [  4,   6,   6,  16,  21,   4,   0,  50,   8, 680]]

cf = np.array(cf)
TP = cf.diagonal()

precision_micro = TP.sum()/cf.sum()
recall_micro = TP.sum()/cf.sum()


precision_macro = np.nanmean(TP/cf.sum(0))
recall_macro = np.nanmean(TP/cf.sum(1))

print("Calculated:")
print("precision_micro:", precision_micro)
print("precision_macro:", precision_macro)
print("recall_micro:", recall_micro)
print("recall_macro:", recall_macro)

Calculated:
precision_micro: 0.872125
precision_macro: 0.8702549015235986
recall_micro: 0.872125
recall_macro: 0.8696681555022805

需要帮助找到混淆矩阵的精度和召回率

2 个答案: