Question

我正在使用Spark mlib's BinaryClassificationMetrics类来生成RandomForestClassificationModel的输出指标。我阅读了Spark文档，并能够生成thresholds，precisionByThreshold，recallByThreshold，roc和pr。

我想知道在生成roc时是否使用了任何特定的阈值。这是因为在ROC wikipedia中它表示：

ROC曲线是通过在各种阈值设置下绘制真实阳性率（TPR）与阴性阳性率（FPR）绘制而成的。

我想知道在Spark中生成ROC时是否使用了最佳阈值。如果不是为什么？

Answer 1

我相信它是0.5，BinaryClassificationMetrics使用BinaryLabelCounter，其标签计数方法如下所示：

def +=(label: Double): BinaryLabelCounter = {
  // Though we assume 1.0 for positive and 0.0 for negative, the following check will handle
  // -1.0 for negative as well.
  if (label > 0.5) numPositives += 1L else numNegatives += 1L
  this
}

Spark Java：Spark BinaryClassificationMetrics类中用于计算ROC的最佳阈值

1 个答案: