我正在尝试进行二进制类分类。由于我的数据集很小(有275个样本),因此我完成了留一法交叉验证,并希望获得所有折叠的平均分类报告和AUROC / AUPRC。
我一直密切关注this link以得出结果,但是我不明白代码在最后一行中的作用。
for i in classifiers:
print(i)
originalclass = []
predictedclass = []
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
scores = cross_val_score(model, subset, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
scores = cross_val_score(model, X_reduced, y,cv=loo,scoring=make_scorer(classification_report_with_accuracy_score))
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, predictedclass))
以上代码中平均发生在哪里?我正在计算平均简历得分并将其打印出来。但是接下来的那句话让我最困惑。我在一开始就初始化了原始类和预测类变量,但是在最后一行打印之前在哪里使用它?
print(classification_report(originalclass, predictedclass))
编辑后的代码
for i in classifiers:
print(i)
originalclass = y
model=i
loo = LeaveOneOut()
print('Scores before feature selection')
y_pred = cross_val_predict(model, subset, y, cv=loo)
print(classification_report(originalclass, y_pred))
print("CV score",np.mean(cross_val_score(model,subset,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
print('Scores after feature selection')
X_reduced=feature_reduction_using_RFECV(model,subset,y)
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
print(classification_report(originalclass, y_pred))
答案 0 :(得分:0)
使用时
print("CV score",np.mean(cross_val_score(model,X_reduced,y,cv=loo,scoring='roc_auc')))
您可以在roc_auc
(即,LeaveOneOut方案)下打印模型的交叉验证的平均cv
度量。
下一个命令:
print(classification_report(originalclass, predictedclass))
用于打印完整的分类报告,而不是打印上一行的平均roc_auc
指标。
此命令将以下内容作为输入参数:
classification_report(y_true, y_pred)
y_true
对您来说是originalclass
,地面真理和y_pred
应该是预测的交叉验证标签/类。
您应该具有以下内容:
y_pred = cross_val_predict(model, X_reduced, y, cv=loo)
classification_report(originalclass, y_pred)
现在,y_pred
已经是标签的交叉验证预测,因此分类报告将根据分类指标打印交叉验证的结果。
用于说明上述内容的玩具示例:
from sklearn.metrics import classification_report
originalclass = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
print(classification_report(originalclass, y_pred))
precision recall f1-score support
0 0.50 1.00 0.67 1
1 0.00 0.00 0.00 1
2 1.00 0.67 0.80 3
micro avg 0.60 0.60 0.60 5
macro avg 0.50 0.56 0.49 5
weighted avg 0.70 0.60 0.61 5