我不是专家用户。我知道我可以得到混淆矩阵,但是我想获得一个以错误方式分类的行的列表,以便在分类后对其进行研究。
在stackoverflow上,我发现了这个Can I get a list of wrong predictions in SVM score function in scikit-learn,但我不确定是否了解所有内容。
这是示例代码。
# importing necessary libraries
from sklearn import datasets
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
# loading the iris dataset
iris = datasets.load_iris()
# X -> features, y -> label
X = iris.data
y = iris.target
# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 0)
# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)
# model accuracy for X_test
accuracy = svm_model_linear.score(X_test, y_test)
# creating a confusion matrix
cm = confusion_matrix(y_test, svm_predictions)
要遍历各行并找到错误的行,建议的解决方案是:
predictions = clf.predict(inputs)
for input, prediction, label in zip(inputs, predictions, labels):
if prediction != label:
print(input, 'has been classified as ', prediction, 'and should be ', label)
我不明白什么是“输入” /“输入”。如果我将此代码调整为适合自己的代码,如下所示:
for input, prediction, label in zip (X_test, svm_predictions, y_test):
if prediction != label:
print(input, 'has been classified as ', prediction, 'and should be ', label)
我获得:
[6. 2.7 5.1 1.6] has been classified as 2 and should be 1
第6行是错误的行吗? 6.之后的数字是多少?我之所以这样问,是因为我在比这更大的数据集上使用了相同的代码,因此我想确保自己做的正确。 我没有发布其他数据集,因为不幸的是我无法发布该数据集,但是问题是我获得了以下内容:
(0, 253) 0.5339655767137572
(0, 601) 0.27665553856928027
(0, 1107) 0.7989633757962163 has been classified as 7 and should be 3
(0, 885) 0.3034934766501018
(0, 1295) 0.6432561790864061
(0, 1871) 0.7029318585026516 has been classified as 7 and should be 6
(0, 1020) 1.0 has been classified as 3 and should be 8
当我对最后输出的每一行进行计数时,我获得了测试集的两行...因此,我不确定我所分析的预测结果列表是否正确……
答案 0 :(得分:0)
第6行是错误的行吗? 6.之后的数字是多少?
否-[6. 2.7 5.1 1.6]
是实际样本(即其特征)。要获取错误行的索引,我们应该稍微修改for
循环:
for idx, input, prediction, label in zip(enumerate(X_test), X_test, svm_predictions, y_test):
if prediction != label:
print("No.", idx[0], 'input,',input, ', has been classified as', prediction, 'and should be', label)
现在的结果是
No. 37 input, [ 6. 2.7 5.1 1.6] , has been classified as 2 and should be 1
这意味着X_test[37]
([ 6. 2.7 5.1 1.6]
)已被我们的SVM预测为2,而其真实标签为1。
让我们确认以下内容:
X_test[37]
# array([ 6. , 2.7, 5.1, 1.6])
svm_predictions[37]
# 2
y_test[37]
# 1
此结果与您的混淆矩阵cm
相符,该矩阵实际上仅显示X_test
中一个错误分类的样本:
cm
# result:
array([[13, 0, 0],
[ 0, 15, 1],
[ 0, 0, 9]], dtype=int64)
一个更优雅的for
循环,因为枚举包括样本本身,所以可能是:
for idx, prediction, label in zip(enumerate(X_test), svm_predictions, y_test):
if prediction != label:
print("Sample", idx, ', has been classified as', prediction, 'and should be', label)
给出
Sample (37, array([ 6. , 2.7, 5.1, 1.6])) , has been classified as 2 and should be 1
答案 1 :(得分:0)
如果只想获取分类错误的实例的列表,则可以执行以下操作:
# with the following sentence you can get a mask of the items bad classified
mask = np.logical_not(np.equal(y_test, predictions))
# Now you can use the mask to see the elements bad classified:
print(f"Elements wrong classified: {X_test[mask]}")
print(f"Prediction by the model for each of those elements: {predictions[mask]}")
print(f"Actual value for each of those elements: {np.asarray(y_test)[mask]}")