Question

我想使用精度和召回曲线的 AUC 作为指标来训练我的模型。使用交叉验证时，我需要为此制作特定的评分器吗？

考虑以下可重现的示例。注意不平衡的目标变量。

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, RepeatedStratifiedKFold

# generate 2 class dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42, weights=[.95])
# split into train/test sets
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.2, random_state=2)

def evaluate_model(X, y, model):
    cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=42)
    scores = cross_val_score(model, X, y, scoring='roc_auc', cv=cv, n_jobs=-1)
    return scores

model = LogisticRegression(solver='liblinear')
scores = evaluate_model(X=trainX, y=trainy, model=model)
scores

我不相信 roc_auc 评分者正在测量 Precision Recall 曲线的 AUC。如何实现此评分器以进行交叉验证？

Answer 1

“平均精度”是您可能想要的，测量 PR 曲线下的非插值区域。请参阅用户指南的 this example 和 this section 的最后几段。

对于记分员，使用socket.write()；度量函数是 average_precision_score。

Sklearn -> 使用 Precision Recall AUC 作为交叉验证中的评分指标

1 个答案: