Question

我正在使用多种算法对数据进行分类，包括

from sklearn.metrics import confusion_matrix, classification_report
classification_report(y_test, predicted)

在拟合数据后，我使用以下方法分析结果：

confusion matrix (knn)
               Predicted Negative  Predicted Positive
True Negative               14776                5442
True Positive                2367                6337
             precision    recall  f1-score   support

          f       0.73      0.86      0.79     17143
          t       0.73      0.54      0.62     11779

avg / total       0.73      0.73      0.72     28922

confusion matrix (SVM)
               Predicted Negative  Predicted Positive
True Negative               14881                4947
True Positive                2262                6832
             precision    recall  f1-score   support

          f       0.75      0.87      0.81     17143
          t       0.75      0.58      0.65     11779

avg / total       0.75      0.75      0.74     28922

confusion matrix (logistic regression)
               Predicted Negative  Predicted Positive
True Negative               14881                4947
True Positive                2262                6832
             precision    recall  f1-score   support

          f       0.75      0.87      0.81     17143
          t       0.75      0.58      0.65     11779

avg / total       0.75      0.75      0.74     28922

confusion matrix (decision tree)
               Predicted Negative  Predicted Positive
True Negative               14852                4941
True Positive                2291                6838
             precision    recall  f1-score   support

          f       0.75      0.87      0.80     17143
          t       0.75      0.58      0.65     11779

avg / total       0.75      0.75      0.74     28922

confusion matrix (naive_bayes)
               Predicted Negative  Predicted Positive
True Negative               13435                4759
True Positive                3708                7020
             precision    recall  f1-score   support

          f       0.74      0.78      0.76     17143
          t       0.65      0.60      0.62     11779

avg / total       0.70      0.71      0.70     28922

confusion matrix (random_forest)
               Predicted Negative  Predicted Positive
True Negative               13287                5248
True Positive                3856                6531
             precision    recall  f1-score   support

          f       0.72      0.78      0.74     17143
          t       0.63      0.55      0.59     11779

avg / total       0.68      0.69      0.68     28922

confusion matrix (gradient_boost)
               Predicted Negative  Predicted Positive
True Negative               15071                5583
True Positive                2072                6196
             precision    recall  f1-score   support

          f       0.73      0.88      0.80     17143
          t       0.75      0.53      0.62     11779

avg / total       0.74      0.74      0.72     28922


confusion matrix (neural network MLPClassifier)
               Predicted Negative  Predicted Positive
True Negative               10789                3653
True Positive                6354                8126
             precision    recall  f1-score   support

          f       0.75      0.63      0.68     17143
          t       0.56      0.69      0.62     11779

avg / total       0.67      0.65      0.66     28922

我不完全清楚“预测的正面/负面”等语义在它试图预测的标签方面的语义。

也许更重要的是我不明白并试图分析为什么所有各种算法在“预测的负/真负面与预测的负面/真正正面”部分方面相对较好地预测，但是对于“预测积极”部分。

换句话说，根据我的理解，它很擅长说“不是某种东西”，但基本上抛硬币来预测“是某种东西”（约50-50）

这里是我为不同技术生成的一些示例分类报告：

do { /* ... */ } while ((n < 23)&&(n >= 0));

唯一似乎合理预测“预测为正”的是MLPClassifier分类器。

Answer 1

抱歉，我不知道您使用的数据集是怎样的。但是，让我们说有一种翻转硬币实验有两种结果，无论是头部（1）还是尾部（0）。现在我们实现一个回归算法，根据一系列可能的特征来预测结果。

如果预测正确（与类标签相同），我们会将其视为真实的预测。如果没有，那将是一个错误的记录。如果算法输出“Head”预测，则将其视为正结果，而对于“tail”则为负。

对于单个“True Positive”部分，它有一点价值。但是如果我们用“假阴性”加上它，它们的总和实际上就是积极情况的数量。

如果我们通过所有正面情况（通常称为“召回”或TP率）的总和来划分“真阳性”，我们就会得到该模型在预测正（头）情况时的准确性。

我们可以将TP率（TP / P）与FP率（FP / N）进行比较，以分析给定模型的性能。

还有一些其他的组合和用法，包括敏感性和特异性等积极，消极，真实，虚假和速率等。

如果您想了解更多信息，我建议您查看ROC Curve

所有分类器都预测“坏”积极因素

1 个答案: