Roc曲线未显示在图表上(Sklearn-kNN)

时间:2019-04-04 21:07:31

标签: python machine-learning scikit-learn knn roc

我有一个k-NN算法,使用参数 K = 3 交叉验证= 10 ,如下所示。使该算法在可以假设2个值的数据集中运行:background / normal和bot。在运行下面提供的数据集时,它将生成以下混淆矩阵:

       confusion matrix
Background/Normal [[23553   136]
Bot               [  237  1786]] 

和以下指标:

                   precision    recall  f1-score   support

Background/Normal       0.99      0.99      0.99     23689
              Bot       0.93      0.88      0.91      2023

        micro avg       0.99      0.99      0.99     25712
        macro avg       0.96      0.94      0.95     25712
     weighted avg       0.99      0.99      0.99     25712


Cross Val Score
[0.97773673 0.98288936 0.98425044 0.96869531 0.96800856 0.98064955 0.97851031 0.97782964 0.95702061 0.95439518] 

Cross val avg: 0.9729985691978758 

RocAUC score: 0.9725001834719879

CTU13/10 - Dataset Link

唯一的问题是生成ROC曲线图。我不明白为什么会收到此错误,因为它为metric命令roc_aus_score生成了一个输出,但是在图形上将AUC = nan

print(metrics.roc_auc_score(y_test, y_pred_prob, average='micro'))
OUTPUT: RocAUC score: 0.9725001834719879

图形输出错误Image error

并显示此警告:

  

Python3 \ lib \ site-packages \ sklearn \ metrics \ ranking.py:656:UndefinedMetricWarning:y_true中没有正样本,真正的正值应该是无意义的UndefinedMetricWarning)

程序代码

import pandas as pd
import time
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.metrics import auc

def openfile():
    df = pd.read_csv('TestfileBEB - kNN.csv')

    return df

def main():

    openfile= abrir()

    X = dataset.drop(columns=['Label'])
    y = dataset['Label'].values

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

    Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
    Classifier.fit(X_train, y_train)

    y_pred_class = Classifier.predict(X_test)

    score = cross_val_score(Classifier, X, y, cv=10)

    y_pred_prob = Classifier.predict_proba(X_test)[:, 1]


    print("accuracy_score:", metrics.accuracy_score(y_test, y_pred_class),'\n')
    print("confusion matrix")
    print(metrics.confusion_matrix(y_test, y_pred_class),'\n')
    print(metrics.classification_report(y_test, y_pred_class, digits=2),'\n')
    print("Cross Val Score")
    print(score,'\n')
    print("Cross val avg:", score.mean(),'\n')
    print(metrics.roc_auc_score(y_test, y_pred_prob, average='micro'))


    fpr, tpr, threshold = roc_curve(y_test, y_pred_prob, pos_label=2)
    roc_auc = auc(fpr, tpr)
    plt.title('Receiver Operating Characteristic')
    plt.plot(fpr, tpr, 'b', label='AUC = %0.2f' % roc_auc)
    plt.legend(loc='lower right')
    plt.plot([0, 1], [0, 1], 'r--')
    plt.xlim([0, 1])
    plt.ylim([0, 1])
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')
    plt.title('ROC Curve of kNN')
    plt.show()

我想知道代码有什么问题以及如何改进它,以便该指标能够成功显示。

0 个答案:

没有答案