我在Python中编写了一个混淆矩阵计算代码:
def conf_mat(prob_arr, input_arr):
# confusion matrix
conf_arr = [[0, 0], [0, 0]]
for i in range(len(prob_arr)):
if int(input_arr[i]) == 1:
if float(prob_arr[i]) < 0.5:
conf_arr[0][1] = conf_arr[0][1] + 1
else:
conf_arr[0][0] = conf_arr[0][0] + 1
elif int(input_arr[i]) == 2:
if float(prob_arr[i]) >= 0.5:
conf_arr[1][0] = conf_arr[1][0] +1
else:
conf_arr[1][1] = conf_arr[1][1] +1
accuracy = float(conf_arr[0][0] + conf_arr[1][1])/(len(input_arr))
prob_arr是我的分类代码返回的数组,示例数组是这样的:
[1.0, 1.0, 1.0, 0.41592955657342651, 1.0, 0.0053405015805891975, 4.5321494433440449e-299, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.70943426182688163, 1.0, 1.0, 1.0, 1.0]
input_arr是数据集的原始类标签,它是这样的:
[2, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1]
我的代码试图做的是:我得到prob_arr和input_arr,并为每个类(1和2)检查它们是否被错误分类。
但我的代码只适用于两个类。如果我为多个类别的数据运行此代码,则它不起作用。我怎样才能为多个课程制作这个?
例如,对于具有三个类的数据集,它应返回:[[21,7,3],[3,38,6],[5,4,19]]
答案 0 :(得分:120)
Scikit-Learn提供confusion_matrix
功能
from sklearn.metrics import confusion_matrix
y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
confusion_matrix(y_actu, y_pred)
输出Numpy数组
array([[3, 0, 0],
[0, 1, 2],
[2, 1, 3]])
但您也可以使用Pandas创建一个混淆矩阵:
import pandas as pd
y_actu = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2], name='Actual')
y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2], name='Predicted')
df_confusion = pd.crosstab(y_actu, y_pred)
你会得到一个(标记清楚的)Pandas DataFrame:
Predicted 0 1 2
Actual
0 3 0 0
1 0 1 2
2 2 1 3
如果您添加margins=True
喜欢
df_confusion = pd.crosstab(y_actu, y_pred, rownames=['Actual'], colnames=['Predicted'], margins=True)
您还将获得每行和每列的总和:
Predicted 0 1 2 All
Actual
0 3 0 0 3
1 0 1 2 3
2 2 1 3 6
All 5 2 5 12
您还可以使用以下方法获取标准化混淆矩阵:
df_conf_norm = df_confusion / df_confusion.sum(axis=1)
Predicted 0 1 2
Actual
0 1.000000 0.000000 0.000000
1 0.000000 0.333333 0.333333
2 0.666667 0.333333 0.500000
您可以使用
绘制此confusion_matriximport matplotlib.pyplot as plt
def plot_confusion_matrix(df_confusion, title='Confusion matrix', cmap=plt.cm.gray_r):
plt.matshow(df_confusion, cmap=cmap) # imshow
#plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(df_confusion.columns))
plt.xticks(tick_marks, df_confusion.columns, rotation=45)
plt.yticks(tick_marks, df_confusion.index)
#plt.tight_layout()
plt.ylabel(df_confusion.index.name)
plt.xlabel(df_confusion.columns.name)
plot_confusion_matrix(df_confusion)
使用以下方法绘制标准化混淆矩阵:
plot_confusion_matrix(df_conf_norm)
您可能也对此项目https://github.com/pandas-ml/pandas-ml及其Pip包https://pypi.python.org/pypi/pandas_ml
感兴趣使用这个包混淆矩阵可以漂亮打印,绘图。 您可以对混淆矩阵进行二值化,得到类别统计数据,如TP,TN,FP,FN,ACC,TPR,FPR,FNR,TNR(SPC),LR +,LR-,DOR,PPV,FDR,FOR,NPV和一些整体统计
In [1]: from pandas_ml import ConfusionMatrix
In [2]: y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
In [3]: y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
In [4]: cm = ConfusionMatrix(y_actu, y_pred)
In [5]: cm.print_stats()
Confusion Matrix:
Predicted 0 1 2 __all__
Actual
0 3 0 0 3
1 0 1 2 3
2 2 1 3 6
__all__ 5 2 5 12
Overall Statistics:
Accuracy: 0.583333333333
95% CI: (0.27666968568210581, 0.84834777019156982)
No Information Rate: ToDo
P-Value [Acc > NIR]: 0.189264302376
Kappa: 0.354838709677
Mcnemar's Test P-Value: ToDo
Class Statistics:
Classes 0 1 2
Population 12 12 12
P: Condition positive 3 3 6
N: Condition negative 9 9 6
Test outcome positive 5 2 5
Test outcome negative 7 10 7
TP: True Positive 3 1 3
TN: True Negative 7 8 4
FP: False Positive 2 1 2
FN: False Negative 0 2 3
TPR: (Sensitivity, hit rate, recall) 1 0.3333333 0.5
TNR=SPC: (Specificity) 0.7777778 0.8888889 0.6666667
PPV: Pos Pred Value (Precision) 0.6 0.5 0.6
NPV: Neg Pred Value 1 0.8 0.5714286
FPR: False-out 0.2222222 0.1111111 0.3333333
FDR: False Discovery Rate 0.4 0.5 0.4
FNR: Miss Rate 0 0.6666667 0.5
ACC: Accuracy 0.8333333 0.75 0.5833333
F1 score 0.75 0.4 0.5454545
MCC: Matthews correlation coefficient 0.6831301 0.2581989 0.1690309
Informedness 0.7777778 0.2222222 0.1666667
Markedness 0.6 0.3 0.1714286
Prevalence 0.25 0.25 0.5
LR+: Positive likelihood ratio 4.5 3 1.5
LR-: Negative likelihood ratio 0 0.75 0.75
DOR: Diagnostic odds ratio inf 4 2
FOR: False omission rate 0 0.2 0.4285714
我注意到一个关于名为PyCM的混淆矩阵的新Python库已经出局了:也许你可以看一下。
答案 1 :(得分:13)
Scikit-learn(我推荐使用它)将它包含在metrics
模块中:
>>> from sklearn.metrics import confusion_matrix
>>> y_true = [0, 1, 2, 0, 1, 2, 0, 1, 2]
>>> y_pred = [0, 0, 0, 0, 1, 1, 0, 2, 2]
>>> confusion_matrix(y_true, y_pred)
array([[3, 0, 0],
[1, 1, 1],
[1, 1, 1]])
答案 2 :(得分:10)
如果你不想让scikit学会为你做这项工作......
import numpy
actual = numpy.array(actual)
predicted = numpy.array(predicted)
# calculate the confusion matrix; labels is numpy array of classification labels
cm = numpy.zeros((len(labels), len(labels)))
for a, p in zip(actual, predicted):
cm[a][p] += 1
# also get the accuracy easily with numpy
accuracy = (actual == predicted).sum() / float(len(actual))
或者在NLTK中查看更完整的实现。
答案 3 :(得分:9)
近十年过去了,但这篇文章的解决方案(没有sklearn)是错综复杂且不必要的。计算混淆矩阵可以用Python在几行中干净地完成。例如:
import numpy as np
def compute_confusion_matrix(true, pred):
'''Computes a confusion matrix using numpy for two np.arrays
true and pred.
Results are identical (and similar in computation time) to:
"from sklearn.metrics import confusion_matrix"
However, this function avoids the dependency on sklearn.'''
K = len(np.unique(true)) # Number of classes
result = np.zeros((K, K))
for i in range(len(true)):
result[true[i]][pred[i]] += 1
return result
答案 4 :(得分:3)
此函数为任意数量的类创建混淆矩阵。
def create_conf_matrix(expected, predicted, n_classes):
m = [[0] * n_classes for i in range(n_classes)]
for pred, exp in zip(predicted, expected):
m[pred][exp] += 1
return m
def calc_accuracy(conf_matrix):
t = sum(sum(l) for l in conf_matrix)
return sum(conf_matrix[i][i] for i in range(len(conf_matrix))) / t
与上面的函数相比,你必须在调用函数之前根据你的分类结果提取预测的类,即sth。像
[1 if p < .5 else 2 for p in classifications]
答案 5 :(得分:2)
这是一个混淆矩阵类,支持漂亮打印等:
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.metrics.confusionmatrix-pysrc.html
答案 6 :(得分:1)
您可以使用numpy
使代码更简洁,并且(有时)更快地运行。例如,在两个类的情况下,您的函数可以重写为(请参阅mply.acc()
):
def accuracy(actual, predicted):
"""accuracy = (tp + tn) / ts
, where:
ts - Total Samples
tp - True Positives
tn - True Negatives
"""
return (actual == predicted).sum() / float(len(actual))
,其中:
actual = (numpy.array(input_arr) == 2)
predicted = (numpy.array(prob_arr) < 0.5)
答案 7 :(得分:1)
cgnorthcutt 解决方案的一个小改动,考虑到字符串类型变量
def get_confusion_matrix(l1, l2):
assert len(l1)==len(l2), "Two lists have different size."
K = len(np.unique(l1))
# create label-index value
label_index = dict(zip(np.unique(l1), np.arange(K)))
result = np.zeros((K, K))
for i in range(len(l1)):
result[label_index[l1[i]]][label_index[l2[i]]] += 1
return result
答案 8 :(得分:1)
<强>更新强>
自写这篇文章以来,我已经更新了我的库实现,以包含一些其他不错的功能。与下面的代码一样,不需要第三方依赖项。该类还可以输出一个很好的制表,类似于许多常用的统计包。请参阅此Gist。
简单的多类实现
在大约O(N)时间内,使用vanilla Python可以非常简单地计算出多类混淆矩阵。我们需要做的就是将actual
向量中的唯一类配对成一个二维列表。从那里,我们只需遍历压缩的actual
和predicted
向量并填充计数。
# A Simple Confusion Matrix Implementation
def confusionmatrix(actual, predicted, normalize = False):
"""
Generate a confusion matrix for multiple classification
@params:
actual - a list of integers or strings for known classes
predicted - a list of integers or strings for predicted classes
normalize - optional boolean for matrix normalization
@return:
matrix - a 2-dimensional list of pairwise counts
"""
unique = sorted(set(actual))
matrix = [[0 for _ in unique] for _ in unique]
imap = {key: i for i, key in enumerate(unique)}
# Generate Confusion Matrix
for p, a in zip(predicted, actual):
matrix[imap[p]][imap[a]] += 1
# Matrix Normalization
if normalize:
sigma = sum([sum(matrix[imap[i]]) for i in unique])
matrix = [row for row in map(lambda i: list(map(lambda j: j / sigma, i)), matrix)]
return matrix
<强>用法强>
# Input Below Should Return: [[2, 1, 0], [0, 2, 1], [1, 2, 1]]
cm = confusionmatrix(
[1, 1, 2, 0, 1, 1, 2, 0, 0, 1], # actual
[0, 1, 1, 0, 2, 1, 2, 2, 0, 2] # predicted
)
# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]
注意: actual
类位于列中,predicted
类位于行中。
# Actual
# 0 1 2
# # #
[[2, 1, 0], # 0
[0, 2, 1], # 1 Predicted
[1, 2, 1]] # 2
班级名称可以是字符串或整数
# Input Below Should Return: [[2, 1, 0], [0, 2, 1], [1, 2, 1]]
cm = confusionmatrix(
["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"] # predicted
)
# And The Output
print(cm)
[[2, 1, 0], [0, 2, 1], [1, 2, 1]]
您还可以使用比例(标准化)返回矩阵
# Input Below Should Return: [[0.2, 0.1, 0.0], [0.0, 0.2, 0.1], [0.1, 0.2, 0.1]]
cm = confusionmatrix(
["B", "B", "C", "A", "B", "B", "C", "A", "A", "B"], # actual
["A", "B", "B", "A", "C", "B", "C", "C", "A", "C"], # predicted
normalize = True
)
# And The Output
print(cm)
[[0.2, 0.1, 0.0], [0.0, 0.2, 0.1], [0.1, 0.2, 0.1]]
从多个分类混淆矩阵中提取统计数据
获得矩阵后,您可以计算一堆统计数据来评估分类器。也就是说,从混淆矩阵设置中提取值以进行多重分类可能有点令人头疼。这是一个按类返回混淆矩阵和统计数据的函数:
# Not Required, But Nice For Legibility
from collections import OrderedDict
# A Simple Confusion Matrix Implementation
def confusionmatrix(actual, predicted, normalize = False):
"""
Generate a confusion matrix for multiple classification
@params:
actual - a list of integers or strings for known classes
predicted - a list of integers or strings for predicted classes
@return:
matrix - a 2-dimensional list of pairwise counts
statistics - a dictionary of statistics for each class
"""
unique = sorted(set(actual))
matrix = [[0 for _ in unique] for _ in unique]
imap = {key: i for i, key in enumerate(unique)}
# Generate Confusion Matrix
for p, a in zip(predicted, actual):
matrix[imap[p]][imap[a]] += 1
# Get Confusion Matrix Sum
sigma = sum([sum(matrix[imap[i]]) for i in unique])
# Scaffold Statistics Data Structure
statistics = OrderedDict(((i, {"counts" : OrderedDict(), "stats" : OrderedDict()}) for i in unique))
# Iterate Through Classes & Compute Statistics
for i in unique:
loc = matrix[imap[i]][imap[i]]
row = sum(matrix[imap[i]][:])
col = sum([row[imap[i]] for row in matrix])
# Get TP/TN/FP/FN
tp = loc
fp = row - loc
fn = col - loc
tn = sigma - row - col + loc
# Populate Counts Dictionary
statistics[i]["counts"]["tp"] = tp
statistics[i]["counts"]["fp"] = fp
statistics[i]["counts"]["tn"] = tn
statistics[i]["counts"]["fn"] = fn
statistics[i]["counts"]["pos"] = tp + fn
statistics[i]["counts"]["neg"] = tn + fp
statistics[i]["counts"]["n"] = tp + tn + fp + fn
# Populate Statistics Dictionary
statistics[i]["stats"]["sensitivity"] = tp / (tp + fn) if tp > 0 else 0.0
statistics[i]["stats"]["specificity"] = tn / (tn + fp) if tn > 0 else 0.0
statistics[i]["stats"]["precision"] = tp / (tp + fp) if tp > 0 else 0.0
statistics[i]["stats"]["recall"] = tp / (tp + fn) if tp > 0 else 0.0
statistics[i]["stats"]["tpr"] = tp / (tp + fn) if tp > 0 else 0.0
statistics[i]["stats"]["tnr"] = tn / (tn + fp) if tn > 0 else 0.0
statistics[i]["stats"]["fpr"] = fp / (fp + tn) if fp > 0 else 0.0
statistics[i]["stats"]["fnr"] = fn / (fn + tp) if fn > 0 else 0.0
statistics[i]["stats"]["accuracy"] = (tp + tn) / (tp + tn + fp + fn) if (tp + tn) > 0 else 0.0
statistics[i]["stats"]["f1score"] = (2 * tp) / ((2 * tp) + (fp + fn)) if tp > 0 else 0.0
statistics[i]["stats"]["fdr"] = fp / (fp + tp) if fp > 0 else 0.0
statistics[i]["stats"]["for"] = fn / (fn + tn) if fn > 0 else 0.0
statistics[i]["stats"]["ppv"] = tp / (tp + fp) if tp > 0 else 0.0
statistics[i]["stats"]["npv"] = tn / (tn + fn) if tn > 0 else 0.0
# Matrix Normalization
if normalize:
matrix = [row for row in map(lambda i: list(map(lambda j: j / sigma, i)), matrix)]
return matrix, statistics
计算统计
上面,混淆矩阵用于制表每个类的统计信息,这些统计信息在OrderedDict
中返回,具有以下结构:
OrderedDict(
[
('A', {
'stats' : OrderedDict([
('sensitivity', 0.6666666666666666),
('specificity', 0.8571428571428571),
('precision', 0.6666666666666666),
('recall', 0.6666666666666666),
('tpr', 0.6666666666666666),
('tnr', 0.8571428571428571),
('fpr', 0.14285714285714285),
('fnr', 0.3333333333333333),
('accuracy', 0.8),
('f1score', 0.6666666666666666),
('fdr', 0.3333333333333333),
('for', 0.14285714285714285),
('ppv', 0.6666666666666666),
('npv', 0.8571428571428571)
]),
'counts': OrderedDict([
('tp', 2),
('fp', 1),
('tn', 6),
('fn', 1),
('pos', 3),
('neg', 7),
('n', 10)
])
}),
('B', {
'stats': OrderedDict([
('sensitivity', 0.4),
('specificity', 0.8),
('precision', 0.6666666666666666),
('recall', 0.4),
('tpr', 0.4),
('tnr', 0.8),
('fpr', 0.2),
('fnr', 0.6),
('accuracy', 0.6),
('f1score', 0.5),
('fdr', 0.3333333333333333),
('for', 0.42857142857142855),
('ppv', 0.6666666666666666),
('npv', 0.5714285714285714)
]),
'counts': OrderedDict([
('tp', 2),
('fp', 1),
('tn', 4),
('fn', 3),
('pos', 5),
('neg', 5),
('n', 10)
])
}),
('C', {
'stats': OrderedDict([
('sensitivity', 0.5),
('specificity', 0.625),
('precision', 0.25),
('recall', 0.5),
('tpr', 0.5),
('tnr', 0.625), (
'fpr', 0.375), (
'fnr', 0.5),
('accuracy', 0.6),
('f1score', 0.3333333333333333),
('fdr', 0.75),
('for', 0.16666666666666666),
('ppv', 0.25),
('npv', 0.8333333333333334)
]),
'counts': OrderedDict([
('tp', 1),
('fp', 3),
('tn', 5),
('fn', 1),
('pos', 2),
('neg', 8),
('n', 10)
])
})
]
)
答案 9 :(得分:0)
尽管 sklearn 解决方案非常干净,但如果将其与仅使用 numpy 的解决方案进行比较,它确实很慢。让我举个例子和一个更好/更快的解决方案。
import time
import numpy as np
from sklearn.metrics import confusion_matrix
num_classes = 3
true = np.random.randint(0, num_classes, 10000000)
pred = np.random.randint(0, num_classes, 10000000)
先参考sklearn的解决方案
start = time.time()
confusion = confusion_matrix(true, pred)
print('time: ' + str(time.time() - start)) # time: 9.31
现在一个仅使用 numpy 的更快的解决方案。在这种情况下,我们不是遍历所有样本,而是遍历混淆矩阵并计算每个单元格的值。这使得这个过程非常快。
start = time.time()
confusion = np.zeros((num_classes, num_classes)).astype(np.int64)
for i in range(num_classes):
for j in range(num_classes):
confusion[i][j] = np.sum(np.logical_and(true == i, pred == j))
print('time: ' + str(time.time() - start)) # time: 0.34
答案 10 :(得分:0)
您应该从混淆矩阵中的类映射到一行。
这里的映射很简单:
def row_of_class(classe):
return {1: 0, 2: 1}[classe]
在您的循环中,计算expected_row
,correct_row
和增量conf_arr[expected_row][correct_row]
。你甚至可以获得比你开始时更少的代码。
答案 11 :(得分:0)
可以简单地计算如下:
def confusionMatrix(actual, pred):
TP = (actual==pred)[actual].sum()
TN = (actual==pred)[~actual].sum()
FP = (actual!=pred)[~actual].sum()
FN = (actual!=pred)[actual].sum()
return [[TP, TN], [FP, FN]]
答案 12 :(得分:0)
这是一个简单的实现,可以处理预测的标签和实际的标签中不相等数量的类(请参见示例3和4)。我希望这会有所帮助!
对于刚刚学到这一点的人,这里有一个快速回顾。列的标签指示预测的类别,行的标签指示正确的类别。在示例1中,我们在第一行有[3 1]。同样,行表示真相,因此这意味着正确的标签为“ 0”,并且有4个示例的基本真相标签为“ 0”。列表示预测,因此我们将3/4个样本正确标记为“ 0”,但将1/4错误地标记为“ 1”。
def confusion_matrix(actual, predicted):
classes = np.unique(np.concatenate((actual,predicted)))
confusion_mtx = np.empty((len(classes),len(classes)),dtype=np.int)
for i,a in enumerate(classes):
for j,p in enumerate(classes):
confusion_mtx[i,j] = np.where((actual==a)*(predicted==p))[0].shape[0]
return confusion_mtx
示例1:
actual = np.array([1,1,1,1,0,0,0,0])
predicted = np.array([1,1,1,1,0,0,0,1])
confusion_matrix(actual,predicted)
0 1
0 3 1
1 0 4
示例2:
actual = np.array(["a","a","a","a","b","b","b","b"])
predicted = np.array(["a","a","a","a","b","b","b","a"])
confusion_matrix(actual,predicted)
0 1
0 4 0
1 1 3
示例3:
actual = np.array(["a","a","a","a","b","b","b","b"])
predicted = np.array(["a","a","a","a","b","b","b","z"]) # <-- notice the 3rd class, "z"
confusion_matrix(actual,predicted)
0 1 2
0 4 0 0
1 0 3 1
2 0 0 0
示例4:
actual = np.array(["a","a","a","x","x","b","b","b"]) # <-- notice the 4th class, "x"
predicted = np.array(["a","a","a","a","b","b","b","z"])
confusion_matrix(actual,predicted)
0 1 2 3
0 3 0 0 0
1 0 2 0 1
2 1 1 0 0
3 0 0 0 0
答案 13 :(得分:0)
仅numpy的解决方案,可用于不需要循环的任何数量的类:
import numpy as np
classes = 3
true = np.random.randint(0, classes, 50)
pred = np.random.randint(0, classes, 50)
np.bincount(true * classes + pred).reshape((classes, classes))
答案 14 :(得分:0)
只有numpy,我们可以按照以下方式考虑效率:
def confusion_matrix(pred, label, nc=None):
assert pred.size == label.size
if nc is None:
nc = len(unique(label))
logging.debug("Number of classes assumed to be {}".format(nc))
confusion = np.zeros([nc, nc])
# avoid the confusion with `0`
tran_pred = pred + 1
for i in xrange(nc): # current class
mask = (label == i)
masked_pred = mask * tran_pred
cls, counts = unique(masked_pred, return_counts=True)
# discard the first item
cls = [cl - 1 for cl in cls][1:]
counts = counts[1:]
for cl, count in zip(cls, counts):
confusion[i, cl] = count
return confusion
对于其他功能,例如plot,mean-IoU,请参阅my repositories。
答案 15 :(得分:0)
我写了一个简单的类来构建一个混淆矩阵,而不需要依赖机器学习库。
可以使用该类,例如:
labels = ["cat", "dog", "velociraptor", "kraken", "pony"]
confusionMatrix = ConfusionMatrix(labels)
confusionMatrix.update("cat", "cat")
confusionMatrix.update("cat", "dog")
...
confusionMatrix.update("kraken", "velociraptor")
confusionMatrix.update("velociraptor", "velociraptor")
confusionMatrix.plot()
ConfusionMatrix类:
import pylab
import collections
import numpy as np
class ConfusionMatrix:
def __init__(self, labels):
self.labels = labels
self.confusion_dictionary = self.build_confusion_dictionary(labels)
def update(self, predicted_label, expected_label):
self.confusion_dictionary[expected_label][predicted_label] += 1
def build_confusion_dictionary(self, label_set):
expected_labels = collections.OrderedDict()
for expected_label in label_set:
expected_labels[expected_label] = collections.OrderedDict()
for predicted_label in label_set:
expected_labels[expected_label][predicted_label] = 0.0
return expected_labels
def convert_to_matrix(self, dictionary):
length = len(dictionary)
confusion_dictionary = np.zeros((length, length))
i = 0
for row in dictionary:
j = 0
for column in dictionary:
confusion_dictionary[i][j] = dictionary[row][column]
j += 1
i += 1
return confusion_dictionary
def get_confusion_matrix(self):
matrix = self.convert_to_matrix(self.confusion_dictionary)
return self.normalize(matrix)
def normalize(self, matrix):
amin = np.amin(matrix)
amax = np.amax(matrix)
return [[(((y - amin) * (1 - 0)) / (amax - amin)) for y in x] for x in matrix]
def plot(self):
matrix = self.get_confusion_matrix()
pylab.figure()
pylab.imshow(matrix, interpolation='nearest', cmap=pylab.cm.jet)
pylab.title("Confusion Matrix")
for i, vi in enumerate(matrix):
for j, vj in enumerate(vi):
pylab.text(j, i+.1, "%.1f" % vj, fontsize=12)
pylab.colorbar()
classes = np.arange(len(self.labels))
pylab.xticks(classes, self.labels)
pylab.yticks(classes, self.labels)
pylab.ylabel('Expected label')
pylab.xlabel('Predicted label')
pylab.show()
答案 16 :(得分:0)
从一般意义上讲,您将需要更改概率数组。不是每个实例都有一个数字,而是根据它是否大于0.5进行分类,而是需要一个分数列表(每个类别一个),然后将最大的分数作为分类。选择(又名argmax)。
您可以使用字典来保存每个分类的概率:
prob_arr = [{classification_id: probability}, ...]
选择分类如下:
for instance_scores in prob_arr :
predicted_classes = [cls for (cls, score) in instance_scores.iteritems() if score = max(instance_scores.values())]
这处理两个类具有相同分数的情况。您可以通过选择该列表中的第一个来获得一个分数,但您如何处理这取决于您所分类的内容。
获得预测类列表和预期类列表后,可以使用Torsten Marek之类的代码创建混淆数组并计算准确度。