在布尔监督分类器上用sklearn.model_selection.learning_curve()
绘制学习曲线时,默认情况下显示加权的f1分数。
但是我想绘制特定班级的f1分数。在这种情况下,肯定的类别(又名:1)。
在下面的上下文中(来自sklearn.metrics.classification_report
),其绘制为avg / total
,但是我想绘制类1
的度量。
情节
代码
...
estimator = classifier_class()
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=0)
train_sizes, train_scores, test_scores = learning_curve(estimator, X_recombined, y_recombined, cv=cv) # n_jobs=n_jobs, train_sizes=train_sizes
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()
plt.fill_between(train_sizes,
train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std,
alpha=0.1, color="r")
plt.fill_between(train_sizes,
test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std,
alpha=0.1, color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color="r", label="Training score")
plt.plot(train_sizes, test_scores_mean, 'o-', color="g", label="Cross-validation score")
plt.legend(loc="best")
这可能吗?
答案 0 :(得分:0)
您可以使用learning_curve
参数将自定义得分手设置为scoring
。来自文档:
评分:字符串,可调用或无,可选,默认值:无
一个字符串(请参阅模型评估文档)或一个带有签名计分器(estimator,X,y)的计分器可调用对象/函数。
此外,sklearn.metrics.f1_score
函数文档说:
pos_label:str或int,默认为1
要报告的班级 average ='binary',数据为二进制。如果数据是多类或 多标签,将被忽略;设置标签= [pos_label]和 平均!='binary'将仅报告该标签的分数。
平均值:字符串,[无,“二进制”(默认),“微”,“宏”, “样本”,“加权”]
此参数是必需的 多类/多标签目标。如果为None,则每个课程的分数为 回到。否则,这将确定执行的平均类型 在数据上:
'binary'
:仅报告pos_label指定的类的结果。 仅当目标(y_ {true,pred})为二进制时才适用。
因此,您可以这样做:
from sklearn.model_selection import learning_curve
from sklearn.metrics import f1_score, make_scorer
# Custom scorer
target = 0 # class you want to plot
scorer = make_scorer(lambda y_true, y_pred: f1_score(
y_true, y_pred,
labels=None,
pos_label=target,
average='binary',
sample_weight=None))
train_sizes, train_scores, test_scores = learning_curve(
estimator,
X,
y,
cv=cv,
scoring=scorer)