我有一个k-NN算法,使用参数 K = 3 和交叉验证= 10 ,如下所示。使该算法在可以假设2个值的数据集中运行:background / normal和bot。在运行下面提供的数据集时,它将生成以下混淆矩阵:
confusion matrix
Background/Normal [[23553 136]
Bot [ 237 1786]]
和以下指标:
precision recall f1-score support
Background/Normal 0.99 0.99 0.99 23689
Bot 0.93 0.88 0.91 2023
micro avg 0.99 0.99 0.99 25712
macro avg 0.96 0.94 0.95 25712
weighted avg 0.99 0.99 0.99 25712
Cross Val Score
[0.97773673 0.98288936 0.98425044 0.96869531 0.96800856 0.98064955 0.97851031 0.97782964 0.95702061 0.95439518]
Cross val avg: 0.9729985691978758
RocAUC score: 0.9725001834719879
唯一的问题是生成ROC曲线图。我不明白为什么会收到此错误,因为它为metric命令roc_aus_score生成了一个输出,但是在图形上将AUC = nan
print(metrics.roc_auc_score(y_test, y_pred_prob, average='micro'))
OUTPUT: RocAUC score: 0.9725001834719879
图形输出错误: Image error
并显示此警告:
Python3 \ lib \ site-packages \ sklearn \ metrics \ ranking.py:656:UndefinedMetricWarning:y_true中没有正样本,真正的正值应该是无意义的UndefinedMetricWarning)
程序代码:
import pandas as pd
import time
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.metrics import auc
def openfile():
df = pd.read_csv('TestfileBEB - kNN.csv')
return df
def main():
openfile= abrir()
X = dataset.drop(columns=['Label'])
y = dataset['Label'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
Classifier = KNeighborsClassifier(n_neighbors=3, p=2, metric='euclidean')
Classifier.fit(X_train, y_train)
y_pred_class = Classifier.predict(X_test)
score = cross_val_score(Classifier, X, y, cv=10)
y_pred_prob = Classifier.predict_proba(X_test)[:, 1]
print("accuracy_score:", metrics.accuracy_score(y_test, y_pred_class),'\n')
print("confusion matrix")
print(metrics.confusion_matrix(y_test, y_pred_class),'\n')
print(metrics.classification_report(y_test, y_pred_class, digits=2),'\n')
print("Cross Val Score")
print(score,'\n')
print("Cross val avg:", score.mean(),'\n')
print(metrics.roc_auc_score(y_test, y_pred_prob, average='micro'))
fpr, tpr, threshold = roc_curve(y_test, y_pred_prob, pos_label=2)
roc_auc = auc(fpr, tpr)
plt.title('Receiver Operating Characteristic')
plt.plot(fpr, tpr, 'b', label='AUC = %0.2f' % roc_auc)
plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.title('ROC Curve of kNN')
plt.show()
我想知道代码有什么问题以及如何改进它,以便该指标能够成功显示。