Question

我正在尝试复制与此相似的敏感性特异性图： X轴为阈值

但是我还没有找到如何做的方法，一些简单的度量标准（例如ROC曲线）会返回真实的正数和错误的正数，但是我没有找到任何办法可以在那里绘制该图。

我正在尝试将概率与实际标签进行比较以保持计数，我得到的图如下：

因此X标签必须经过某种程度的标准化处理，以便曲线实际上可以上下移动。

Answer 1

我不认为该情节正在显示您认为正在显示的内容。当阈值下降到零时，灵敏度将接近1，因为100％的观测值将被归类为正，而假阴性率将降为零。同样，选择性将随着阈值接近1而接近1，因为每个观察都将归类为阴性，而误报率将为零。因此，该图未显示灵敏度或选择性。

要在x轴上绘制选择性和灵敏度作为阈值的函数，我们可以使用内置的ROC功能并从中提取值以自己的方式绘制它们。给定一个二进制标签test_y的向量，关联的预测变量test_x的矩阵和适合的RandomForestClassifier对象rfc：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import precision_score, recall_score

# Get the estimated probabilities of each observation being categorized as positive
# [:,1] for probabilities of negative
predicted_y_probs = rfc.predict_proba(test_x)[:,0]

thresholds = np.linspace(0,1,20) # or however many points you want

sensitivities = [recall_score(test_y, predicted_y_probs >= t) for t in thresholds]
selectivities = [precision_score(test_y, predicted_y_probs >= t) for t in thresholds]
plt.plot(thresholds, sensitivies, label='sensitivity')
plt.plot(thresholds, selectivities, label='selectivity')
plt.legend()

但是，这将不会重新创建您作为参考提供的图，这似乎显示了归类为正的每个观测值的估计概率分布。换句话说，该图中的阈值是一个常数，并且x轴显示了每个预测相对于（固定）阈值的下降位置。它并不能直接告诉我们灵敏度或选择性。如果您真的想要这样的情节，请继续阅读。

我想不出重建那些平滑曲线的方法，因为密度图将延伸到零以下和1以上，但是我们可以使用直方图显示信息。使用与以前相同的变量：

# Specify range to ensure both groups show up the same width.
bins = np.linspace(0,1,10)

# Show distributions of estimated probabilities for the two classes.
plt.hist(predicted_y_probs[test_y == 1], alpha=0.5, color='red', label='positive', bins=bins)
plt.hist(predicted_y_probs[test_y == 0], alpha=0.5, color='green', label='negative', bins=bins)

# Show the threshold.
plt.axvline(0.5, c='black', ls='dashed')

# Add labels
plt.legend()

我仅使用三个物种中的两个来为经典的Iris数据集运行此代码，并获得以下输出。 Versicolor是“阳性”，viriginica是“阴性”，而setosa被忽略以产生二进制分类。请注意，我的模型具有完美的召回率，因此所有versicolor的概率都非常接近1.0。由于只有100个样本（其中大多数样本均已正确分类），因此该方法相当不完善，但希望它能使您理解。

Answer 2

在@ApproachingDarknessFish's answer的基础上，您可以将各种分布拟合到所得的直方图上，并非所有分布都在[0,1]之外。例如，至少出于可视化的考虑，β分布将很好地捕获[0,1]上的大多数单峰分布，

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats

test_y = np.array([0]*100 + [1]*100)
predicted_y_probs = np.concatenate((np.random.beta(2,5,100), np.random.beta(8,3,100)))

def estimate_beta(X):
    xbar = np.mean(X)
    vbar = np.var(X,ddof=1)
    alphahat = xbar*(xbar*(1-xbar)/vbar - 1)
    betahat = (1-xbar)*(xbar*(1-xbar)/vbar - 1)
    return alphahat, betahat

positive_beta_estimates = estimate_beta(predicted_y_probs[test_y == 1])
negative_beta_estimates = estimate_beta(predicted_y_probs[test_y == 0])

unit_interval = np.linspace(0,1,100)
plt.plot(unit_interval, scipy.stats.beta.pdf(unit_interval, *positive_beta_estimates), c='r', label="positive")
plt.plot(unit_interval, scipy.stats.beta.pdf(unit_interval, *negative_beta_estimates), c='g', label="negative")

# Show the threshold.
plt.axvline(0.5, c='black', ls='dashed')
plt.xlim(0,1)

# Add labels
plt.legend()

敏感性特异性图python

2 个答案: