在xgb中使用f-score

时间:2016-02-15 01:54:34

标签: xgboost

我尝试使用scikit-learn中的f-score作为xgb分类器中的评估指标。这是我的代码:

clf = xgb.XGBClassifier(max_depth=8, learning_rate=0.004,
                            n_estimators=100,
                            silent=False,   objective='binary:logistic',
                            nthread=-1, gamma=0,
                            min_child_weight=1, max_delta_step=0, subsample=0.8,
                            colsample_bytree=0.6,
                            base_score=0.5,
                            seed=0, missing=None)
scores = []
predictions = []
for train, test, ans_train, y_test in zip(trains, tests, ans_trains, ans_tests):
        clf.fit(train, ans_train, eval_metric=xgb_f1,
                    eval_set=[(train, ans_train), (test, y_test)],
                    early_stopping_rounds=900)
        y_pred = clf.predict(test)
        predictions.append(y_pred)
        scores.append(f1_score(y_test, y_pred))

def xg_f1(y, t):
    t = t.get_label()
    return "f1", f1_score(t, y)

但是有一个错误:

  

无法处理二元和连续的混合

1 个答案:

答案 0 :(得分:4)

问题是f1_score正在尝试比较非二进制与二进制目标,默认情况下,此方法会进行二进制平均。从documentation" 平均值:字符串,[无,'二进制'(默认),'微','宏','样本','加权']&#34 ;。

无论如何,它告诉你的预测是这样的[0.001, 0.7889,0.33...],你的目标是二元[0,1,0...]。因此,如果您知道自己的阈值,我建议您在将结果发送到f1_score函数之前对其进行预处理。阈值的常用值为0.5

评估函数的测试示例。不再输出错误:

def xg_f1(y,t):
    t = t.get_label()
    y_bin = [1. if y_cont > 0.5 else 0. for y_cont in y] # binaryzing your output
    return 'f1',f1_score(t,y_bin)