Question

我将数据分为测试集和训练集，两者均具有目标值“ 0”和“ 1”。但是，在使用SVM进行拟合和预测之后，分类报告指出测试样本中存在零“ 0”，这是不正确的。

from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
df = pd.DataFrame(data = data['data'],columns=data['feature_names'])
x = df
y = data['target']
xtrain,xtest,ytrain,ytest 
= train_test_split(x,y,test_size=0.3,random_state=42)

如下所示，测试有0和1，但是分类报告中的支持表明没有0！

！（https://i.imgur.com/wjEjIvX.png）

Answer 1

（总是一个好主意，在示例中包括您的相关代码，而在图像中不）

分类报告指出测试样本中存在零'0'，这是不正确的。

这是因为，根据链接图像中的代码，您发现您已经切换了classification_report中的参数；您曾经使用过：

print(classification_report(pred, ytest)) # wrong order of arguments

确实可以提供：

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00         0
    class 1       1.00      0.63      0.77       171

avg / total       1.00      0.63      0.77       171

但正确的用法（请参见docs）

print(classification_report(ytest, pred)) # ytest first

给出

             precision    recall  f1-score   support

    class 0       0.00      0.00      0.00        63
    class 1       0.63      1.00      0.77       108

avg / total       0.40      0.63      0.49       171

以及以下警告消息：

C：\ Users \ Root \ Anaconda3 \ envs \ tensorflow1 \ lib \ site-packages \ sklearn \ metrics \ classification.py：1135： UndefinedMetricWarning：精度和F分数定义不正确，在没有预测样本的标签中将其设置为0.0。 '精确'， “预测”，平均，警告）

因为正如评论中已经指出的那样，您只能预测1：

pred
# result:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

其原因是另外一个故事，而不是当前问题的一部分。

以下是上述代码的完整可复制代码：

from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

X, y = load_breast_cancer(return_X_y=True)
xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=0.3,random_state=42)

from sklearn.svm import SVC
svc=SVC()
svc.fit(xtrain, ytrain)
pred = svc.predict(xtest)

print(classification_report(ytest, pred))

使用python中的SVM进行机器学习的分类报告测试集错误

1 个答案: