我在sklearn中使用混淆矩阵。
我的问题是,我无法理解每一行是针对哪个标签的!我的标签是[0, 1, 2, 3, 4, 5]
。
我想知道第一行是否为标签0,第二行是否为标签1,依此类推?
为了确保,我尝试了这个代码,我认为按照标签的顺序制作混淆矩阵。但是我收到了一个错误。
cfr = RandomForestClassifier(n_estimators = 80, n_jobs = 5)
cfr.fit(X1, y1)
predictedY2 = cfr.predict(X2)
shape = np.array([0, 1, 2, 3, 4, 5])
acc1 = cfr.score(X2, y2,shape)
错误是:
acc1 = cfr.score(X2, y2,shape)
TypeError: score() takes exactly 3 arguments (4 given)`
答案 0 :(得分:1)
score
给出了分类器的准确性,即正确预测每个例子的数量。您正在寻找的是predict
函数,它产生为每个输入预测的类。看看这个例子:
import numpy as np
from sklearn.ensemble import RandomForestClassifier as RFC
from sklearn.metrics import confusion_matrix
from sklearn.datasets import make_classification
# Add a random state to the various functions so we all have the same output.
rng = np.random.RandomState(1234)
# Make dataset
X,Y = make_classification( n_samples=1000, n_classes=6, n_features=20, n_informative=15, random_state=rng )
# take random 75% of data as training, leaving rest for test
train_inds = rng.rand(1000) < 0.75
# create and train the classifier
rfc = RFC(n_estimators=80, random_state=rng)
rfc.fit(X[train_inds], Y[train_inds])
# O is the predicted class for each input on the test data
O = rfc.predict(X[~train_inds])
print "Test accuracy: %.2f%%\n" % (rfc.score(X[~train_inds],Y[~train_inds])*100)
print "Confusion matrix:"
print confusion_matrix(Y[~train_inds], O)
打印:
Test accuracy: 57.92%
Confusion matrix:
[[24 4 3 1 1 6]
[ 5 22 4 4 1 1]
[ 5 2 18 5 3 2]
[ 2 4 2 29 1 4]
[ 3 1 3 2 28 3]
[10 4 4 3 8 18]]
根据confusion_matrix
的文档,混淆矩阵的i,j
分量是已知属于类i
但被归类为类j
的对象数。所以在上面,正确分类的对象是在对角线上,但如果你看一下,比如第3行,第0列,它看起来像两个&#34;类3&#34;对象被错误分类为&#34; 0级&#34;对象。
希望这有帮助!