Question

我有一个多类分类任务。当我根据scikit example运行我的脚本时，如下所示：

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

我收到此错误：

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

我尝试将labels=classifier.classes_传递给confusion_matrix()，但这没有帮助。

y_test和y_pred如下：

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])

Answer 1

首先，您需要创建标签输出数组。假设您有3个课程：＆＃39; cat＆＃39;，＆＃39; dog＆＃39;，＆＃39; house＆＃39;索引：0,1,2。对2个样本的预测是：＆＃39; dog＆＃39;＆＃39; house＆＃39;。您的输出将是：

y_pred = [[0, 1, 0],[0, 0, 1]]

运行y_pred.argmax（1）得到：[1,2] 此数组代表原始标签索引，表示： [＆＃39; dog＆＃39;，＆＃39; house＆＃39;]

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)

Answer 2

这对我有用：

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

其中y_test和y_predict是分类变量，例如单热矢量。

Answer 3

我只是从预测y_test矩阵中减去输出y_pred矩阵，同时保留分类格式。在-1的情况下，我假设为假阴性，而在1的情况下，假阳性。

下一步：

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 1:  
    produced_matrix[i,j] = 2

使用以下符号结束：

-1：false negative
1：误报
0：真的否定
2：真正的正面

最后，要进行一些天真的计数，你可以产生任何混淆度量。

如何计算Scikit中多类分类的混淆矩阵？

3 个答案: