Question

我在sklearn中使用混淆矩阵。

我的问题是，我无法理解每一行是针对哪个标签的！我的标签是[0, 1, 2, 3, 4, 5]。

我想知道第一行是否为标签0，第二行是否为标签1，依此类推？

为了确保，我尝试了这个代码，我认为按照标签的顺序制作混淆矩阵。但是我收到了一个错误。

cfr = RandomForestClassifier(n_estimators = 80, n_jobs = 5)
cfr.fit(X1, y1)
predictedY2 = cfr.predict(X2)
shape = np.array([0, 1, 2, 3, 4, 5])
acc1 = cfr.score(X2, y2,shape)

错误是：

acc1 = cfr.score(X2, y2,shape)
TypeError: score() takes exactly 3 arguments (4 given)`

Answer 1

score给出了分类器的准确性，即正确预测每个例子的数量。您正在寻找的是predict函数，它产生为每个输入预测的类。看看这个例子：

import numpy as np
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.metrics import confusion_matrix
from sklearn.datasets import make_classification

# Add a random state to the various functions so we all have the same output.
rng = np.random.RandomState(1234)

# Make dataset
X,Y = make_classification( n_samples=1000, n_classes=6, n_features=20, n_informative=15, random_state=rng ) 
# take random 75% of data as training, leaving rest for test
train_inds = rng.rand(1000) < 0.75

# create and train the classifier
rfc = RFC(n_estimators=80, random_state=rng)
rfc.fit(X[train_inds], Y[train_inds])

# O is the predicted class for each input on the test data
O = rfc.predict(X[~train_inds])

print "Test accuracy: %.2f%%\n" % (rfc.score(X[~train_inds],Y[~train_inds])*100)

print "Confusion matrix:"
print confusion_matrix(Y[~train_inds], O)

打印：

Test accuracy: 57.92%

Confusion matrix:
[[24  4  3  1  1  6]
 [ 5 22  4  4  1  1]
 [ 5  2 18  5  3  2]
 [ 2  4  2 29  1  4]
 [ 3  1  3  2 28  3]
 [10  4  4  3  8 18]]

根据confusion_matrix的文档，混淆矩阵的i,j分量是已知属于类i但被归类为类j的对象数。所以在上面，正确分类的对象是在对角线上，但如果你看一下，比如第3行，第0列，它看起来像两个＆＃34;类3＆＃34;对象被错误分类为＆＃34; 0级＆＃34;对象。

希望这有帮助！

每行都是混淆矩阵python中的标签

1 个答案: