Question

我正在使用StratifiedKFold检查分类器的性能。我有两个课程，我试图建立Logistic回归分类器。这是我的代码

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    score = accuracy_score(y_test, y_pred)
    r.append(score)
    print(score)

print(np.mean(r))

我只可以打印性能得分，但是我不知道如何打印混淆矩阵和分类报告。如果我只在循环中添加打印语句，

print(confusion_matrix(y_test, y_pred))

它将打印10次，但我想报告和分类器最终性能的矩阵。

有关如何计算矩阵和报告的任何帮助。谢谢

Answer 1

交叉验证用于评估数据集不同拆分中特定模型或超参数的性能。最后，您本身没有最终的表现，您拥有每个分组的个人表现以及各个分组的汇总表现。您可能会分别使用tn，fn，fp，tp来创建汇总精度，查全率，灵敏度等...，但是您也可以仅将预定义函数用于sklearn中的那些指标，并在最后汇总它们。

例如

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
accs, precs, recs = [], [], []
for train_index, test_index in skf.split(x, y):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]

    tfidf = TfidfVectorizer()
    x_train = tfidf.fit_transform(x_train)
    x_test = tfidf.transform(x_test)

    clf =  LogisticRegression(class_weight='balanced')
    clf.fit(x_train, y_train)
    y_pred = clf.predict(x_test)
    acc = accuracy_score(y_test, y_pred)
    prec = precision_score(y_test, y_pred)
    rec = recall_score(y_test, y_pred)
    accs.append(acc)
    precs.append(prec)
    recs.append(rec)
    print(f'Accuracy: {acc}, Precision: {prec}, Recall: {rec}')

print(f'Mean Accuracy: {np.mean(accs)}, Mean Precision: {np.mean(precs)}, Mean Recall: {np.mean(recs)}')

StratifiedKFold的混淆矩阵和分类报告

1 个答案: