Question

使用此代码：

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = [1,0,0]
y_predict = [.6,.1,.1]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print(fpr)
print(tpr)
print(thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()


y_true = [1,0,0]
y_predict = [.6,.1,.6]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print(fpr)
print(tpr)
print(thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()

绘制了以下roc曲线：

scikit learning设置了阈值，但我想设置自定义阈值。

例如，对于值：

y_true = [1,0,0]
y_predict = [.6,.1,.6]

返回以下阈值：

[1.6 0.6 0.1]

为什么ROC曲线中不存在值1.6？在这种情况下，阈值1.6是否为冗余，因为概率范围为0-1？是否可以设置自定义阈值：.3，.5，.7来检查分类器在这种情况下的效果？

更新：

在https://sachinkalsi.github.io/blog/category/ml/2018/08/20/top-8-performance-metrics-one-should-know.html#receiver-operating-characteristic-curve-roc中，我使用了相同的x和预测值：

from sklearn import metrics
import numpy as np
import matplotlib.pyplot as plt

y_true = [1,1,1,0]
y_predict = [.94,.87,.83,.80]

fpr, tpr, thresholds = metrics.roc_curve(y_true, y_predict , pos_label=1)

print('false positive rate:', fpr)
print('true positive rate:', tpr)
print('thresholds:', thresholds)

# Print ROC curve
plt.plot(fpr,tpr)
plt.show()

产生此图：

情节与博客中引用的情节不同，阈值也不同：

此外，使用已实现的scikit metrics.roc_curve返回的阈值为：thresholds: [0.94 0.83 0.8 ]。 scikit是否应该返回与使用相同点时类似的roc曲线？我应该自己执行roc curve，而不要依靠scikit的实现，因为结果有所不同？

Answer 1

阈值不会出现在ROC曲线中。 scikit-learn文档说：

thresholds [0]表示未预测任何实例，并且任意设置为max（y_score）+ 1

如果y_predict包含0.3, 0.5, 0.7，则这些阈值将由metrics.roc_curve函数尝试。

通常在计算ROC曲线时遵循以下步骤

1。按降序对y_predict进行排序。

2。对于y_predict中的每个概率得分（假设为τ_i），如果y_predict> =τ_i，则认为该数据点为正。

PS：如果我们有N个数据点，那么我们将有N个阈值（如果y_true和y_predict的组合是唯一的）

3。对于每个y_predicted（τ_i）值，计算TPR和FPR。

4。通过获取N（数据点数）TPR，FPR对来绘制ROC

您可以参考this blog以获得详细信息

如何读取此ROC曲线并设置自定义阈值？

1 个答案: