每行都是混淆矩阵python中的标签

时间:2015-11-04 02:55:51

标签: python numpy import scikit-learn

我在sklearn中使用混淆矩阵。

我的问题是,我无法理解每一行是针对哪个标签的!我的标签是[0, 1, 2, 3, 4, 5]

我想知道第一行是否为标签0,第二行是否为标签1,依此类推?

为了确保,我尝试了这个代码,我认为按照标签的顺序制作混淆矩阵。但是我收到了一个错误。

cfr = RandomForestClassifier(n_estimators = 80, n_jobs = 5)
cfr.fit(X1, y1)
predictedY2 = cfr.predict(X2)
shape = np.array([0, 1, 2, 3, 4, 5])
acc1 = cfr.score(X2, y2,shape)

错误是:

acc1 = cfr.score(X2, y2,shape)
TypeError: score() takes exactly 3 arguments (4 given)`

1 个答案:

答案 0 :(得分:1)

score给出了分类器的准确性,即正确预测每个例子的数量。您正在寻找的是predict函数,它产生为每个输入预测的类。看看这个例子:

import numpy as np
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.metrics import confusion_matrix
from sklearn.datasets import make_classification

# Add a random state to the various functions so we all have the same output.
rng = np.random.RandomState(1234)

# Make dataset
X,Y = make_classification( n_samples=1000, n_classes=6, n_features=20, n_informative=15, random_state=rng ) 
# take random 75% of data as training, leaving rest for test
train_inds = rng.rand(1000) < 0.75

# create and train the classifier
rfc = RFC(n_estimators=80, random_state=rng)
rfc.fit(X[train_inds], Y[train_inds])

# O is the predicted class for each input on the test data
O = rfc.predict(X[~train_inds])

print "Test accuracy: %.2f%%\n" % (rfc.score(X[~train_inds],Y[~train_inds])*100)

print "Confusion matrix:"
print confusion_matrix(Y[~train_inds], O)

打印:

Test accuracy: 57.92%

Confusion matrix:
[[24  4  3  1  1  6]
 [ 5 22  4  4  1  1]
 [ 5  2 18  5  3  2]
 [ 2  4  2 29  1  4]
 [ 3  1  3  2 28  3]
 [10  4  4  3  8 18]]

根据confusion_matrix的文档,混淆矩阵的i,j分量是已知属于类i但被归类为类j的对象数。所以在上面,正确分类的对象是在对角线上,但如果你看一下,比如第3行,第0列,它看起来像两个&#34;类3&#34;对象被错误分类为&#34; 0级&#34;对象。

希望这有帮助!