Question

我知道机器学习中精度和召回指标之间的区别。一个优化假阳性和其他假阴性。在统计中，它被称为优化类型I或类型II错误。

然而，我很困惑，在什么情况下可以完成与Precision和Recall相反的完成？像Precision = 1和Recall = 0？。

让我迭代：

precision = true positives / (true positives + false positives)

recall = true positives / (true positives + false negatives)

这是混淆矩阵

  predicted
            (+)   (-)
            ---------
       (+) | TP | FN |
actual      ---------
       (-) | FP | TN |
            ---------

现在，如果对于positive（1）类的分类器，Precision = 1，则表示没有FP，所有预测标签都是TP。

那么对于相同的正类，Recall怎么能为0？如果已经预测了一些TP，实际上按照精度，所有预测的只有TP，那么对于Recall，我们将分子非零，那么在什么情况下可以得到Recall 0然后对于相同的分类器正类？ / p>

为了给出一些上下文，我为二元分类问题运行了一个Logistic回归分类器。我有一些23K训练数据，有774个功能。 770功能是二进制或虚拟变量。

这是我的班级标签的分布：

1    20429
0    12559

以下是对25个Hyper参数值组合进行5次网格搜索后的混淆矩阵和精度值。

The mean train scores are [ 0.66883049  0.54314532  0.67008959  0.63187226  0.63100366  0.53165968
  0.54131812  0.55507725  0.5578254   0.57663273  0.57247462  0.57230056
  0.54402055  0.5762753   0.50925733  0.45781882  0.39366017  0.39037968
  0.3919818   0.38878762  0.39784982  0.39506755  0.48238147  0.38932944
  0.39801223]

The mean validation scores are [ 0.66445801  0.54107661  0.66878871  0.63184791  0.6305487   0.5291239
  0.53899788  0.55324585  0.55822615  0.57784418  0.57269066  0.57312373
  0.54536399  0.57593868  0.50790351  0.45727773  0.39318349  0.38906933
  0.39214413  0.38924256  0.39794725  0.39461262  0.4827855   0.38811658
  0.39812048]

The score on held out data is: 0.6687887055562773
 Hyper-Parameters for Best Score : {'alpha': 0.0001, 'l1_ratio': 0.45}

The accuracy of sgd on test data is: 0.37526523188845107

Classification Metrics for sgd :
             precision    recall  f1-score   support

          0       0.38      1.00      0.55      3712
          1       1.00      0.00      0.00      6185

avg / total       0.77      0.38      0.21      9897

Answer 1

您发布的输出四舍五入到小数点后两位，因此您可能有精度= 1且召回等于0.001。例如，如果您只有一个案例（正确）预测为1，其他一切预测为0，则可能会发生这种情况。因此您的假阳性率非常高，并且您的召回率为1/6185，接近于0。

了解二进制分类器的精度和召回结果

1 个答案: