我正在模拟一个检索10个文档的搜索引擎,但只有5个是相关的。
from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn.metrics import precision_recall_curve
import matplotlib.pyplot as plt
from sklearn.metrics import average_precision_score
from sklearn.metrics import roc_curve
from sklearn.metrics.ranking import _binary_clf_curve
y_true = np.array([True, True, False, True, False, True, False, False, False, True])
降低获取更多文件的门槛:
y_scores = np.array([1, .9, .8, .7, .6, .5, .4, .3, .2, .1])
现在获得精确度,回忆和阈值:
precisions, recalls, thresholds1 = precision_recall_curve(y_true, y_scores)
print("\nPresicions:")
for pr in precisions:
print('{0:0.2f}'.format(pr), end='; ')
print("\nRecalls:")
for rec in recalls:
print('{0:0.2f}'.format(rec), end='; ')
print("\nThresholds:")
for thr in thresholds1:
print('{0:0.2f}'.format(thr), end='; ')
输出1
Presicions:
0.50; 0.44; 0.50; 0.57; 0.67; 0.60; 0.75; 0.67; 1.00; 1.00; 1.00;
Recalls:
1.00; 0.80; 0.80; 0.80; 0.80; 0.60; 0.60; 0.40; 0.40; 0.20; 0.00;
Thresholds:
0.10; 0.20; 0.30; 0.40; 0.50; 0.60; 0.70; 0.80; 0.90; 1.00;
案例2的输出代码:
falsePositiveRates, truePositiveRates, thresholds2 = roc_curve(y_true, y_scores, pos_label = True)
print("\nFPRs:")
for fpr in falsePositiveRates:
print('{0:0.2f}'.format(fpr), end='; ')
print("\nTPRs:")
for tpr in truePositiveRates:
print('{0:0.2f}'.format(tpr), end='; ')
print("\nThresholds:")
for thr in thresholds2:
print('{0:0.2f}'.format(thr), end='; ')
输出2
FPRs:
0.00; 0.00; 0.20; 0.20; 0.40; 0.40; 1.00; 1.00;
TPRs:
0.20; 0.40; 0.40; 0.60; 0.60; 0.80; 0.80; 1.00;
Thresholds:
1.00; 0.90; 0.80; 0.70; 0.60; 0.50; 0.20; 0.10;
问题 在输出1中,为什么最后的精度(将在图上为第1个)计算为1而不是0?
在输出2中,为什么FPR,TPR和阈值8的长度代替10?
答案 0 :(得分:2)
在output1中为什么最后一个精度(将是第一个绘图)设置为1而不是0?
在限制性最强的阈值中,您只选择一个相关的项目(真正的肯定)。
在output2中,为什么FPR,TPR,Threshold的计数是8而不是10
您允许drop_intermediate默认为True
。 0.3和0.4是次优阈值。