操作系统:Ubuntu 16.04 LTS
CPU:使用Google Colab GPU运行时
C ++ / Python / R版本:Python 3
我将lgb.cv
用作:
cvresult = lgb.cv(alg.get_params(), lgbtrain, num_boost_round=3000, nfold=5,
early_stopping_rounds=80, verbose_eval=True, seed=42, metrics="multi_logloss",
feval=f1_scorer)
我已将我的f1_scorer
(作为feval
传递给lgv.cv
)的功能定义为:
def f1_scorer(y_pred, y):
y = y.get_label().astype("int")
y_pred = y_pred.reshape((-1, 5)).argmax(axis=1)
return "F1_scorer", metrics.f1_score(y, y_pred, average="weighted"), True
我将y_pred
重塑并调整为最大,因为我猜y_pred
是简历上预测的概率。
我的多类别模型中有5个类别,y_pred
的形状是y
(真实标签)的5倍。
例如,如果我有10个示例,那么
y.shape
是(10,)
。我希望y_pred.shape
是(10, 5)
,但它是(50,)
。 (我知道是因为metrics.f1_score
抛出错误,显示了这些形状不匹配的形状。)
因此,我认为将其重塑为(-1, 5)
可以解决问题。
但是我定义f1_scorer
似乎有些错误,因为f1_score
没有增加,而multi_logloss
却有很大的减少。这是我的输出:
[1] cv_agg's multi_logloss: 1.48182 + 0.000107528 cv_agg's F1_scorer: 0.208046 + 0.00171022
[2] cv_agg's multi_logloss: 1.37942 + 0.000209271 cv_agg's F1_scorer: 0.20873 + 0.0017343
[3] cv_agg's multi_logloss: 1.30399 + 0.000401368 cv_agg's F1_scorer: 0.209169 + 0.00158037
[4] cv_agg's multi_logloss: 1.23172 + 0.00047433 cv_agg's F1_scorer: 0.209576 + 0.00178056
[5] cv_agg's multi_logloss: 1.16928 + 0.000577606 cv_agg's F1_scorer: 0.209329 + 0.00187392
[6] cv_agg's multi_logloss: 1.11477 + 0.000623601 cv_agg's F1_scorer: 0.209316 + 0.001725
[7] cv_agg's multi_logloss: 1.06698 + 0.000639912 cv_agg's F1_scorer: 0.209314 + 0.00166868
[8] cv_agg's multi_logloss: 1.0246 + 0.000678841 cv_agg's F1_scorer: 0.209319 + 0.0018861
.
. # skipped some outputs
.
[150] cv_agg's multi_logloss: 0.615405 + 0.00142301 cv_agg's F1_scorer: 0.209562 + 0.00165159
[151] cv_agg's multi_logloss: 0.615341 + 0.00142724 cv_agg's F1_scorer: 0.209498 + 0.0015317
[152] cv_agg's multi_logloss: 0.615274 + 0.0014286 cv_agg's F1_scorer: 0.209505 + 0.00161461
[153] cv_agg's multi_logloss: 0.615205 + 0.00143131 cv_agg's F1_scorer: 0.209524 + 0.0016036
[154] cv_agg's multi_logloss: 0.61514 + 0.00143731 cv_agg's F1_scorer: 0.20951 + 0.00160288
[155] cv_agg's multi_logloss: 0.615072 + 0.00143254 cv_agg's F1_scorer: 0.209491 + 0.00158067
看,multi_logloss
从1.5
降到0.6
,但是f1_score
是不变的。
cv
由于early_stopping
上的f1_scorer
而最终被停止。
我在这里做错了什么? (我怀疑这是我在y_pred
中重塑了f1_scorer
的那部分,但不确定)