Question

我想看看精确度和召回率如何随阈值而变化（不仅仅是相互之间）

model = RandomForestClassifier(500, n_jobs = -1);  
model.fit(X_train, y_train);  
probas = model.predict_proba(X_test)[:, 1]  
precision, recall, thresholds = precision_recall_curve(y_test, probas)  
print len(precision)   
print len(thresholds)

返回：

283  
282

因此，我可以不将它们联系在一起。关于为什么会出现这种情况的任何线索？

Answer 1

对于此问题，应忽略最后一个精度和召回值最后的精度和召回值分别始终为1.和0，并且没有相应的阈值。

例如，这是一个解决方案：

def plot_precision_recall_vs_threshold(precisions, recall, thresholds): 
    fig = plt.figure(figsize= (8,5))
    plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
    plt.plot(thresholds, recall[:-1], "g-", label="Recall")
    plt.legend()

plot_precision_recall_vs_threshold(precision, recall, thresholds)

这些值应该存在，以便在绘制精度与查全率时，图从y轴（x = 0）开始。

在scikit的precision_recall_curve中，为什么阈值与召回和精确度有不同的维度？

1 个答案: