我正在研究回归问题,我想在xgboost库中修改损失函数,这样我的预测永远不会低于实际值。我写了这段代码:
def custom_loss(preds, dtrain):
labels = dtrain.get_label()
df = preds - labels
df = pd.DataFrame(df, columns=['val'])
df['valg'] = df['val'].apply(lambda x: 10*abs(x) if x<0 else x)
grad = df['valg'].as_matrix()
return preds-labels, grad
这实质上意味着我想惩罚那些比实际价值更低的预测。但是,这不起作用,我的预测没有任何改进。任何人都可以帮我弄清楚我哪里出错了吗? 感谢。
编辑: 整个python脚本 -
params = {"booster" : "gbtree",
"eta": 0.20,
"max_depth": 4,
"subsample": 0.75,
"colsample_bytree": 0.65,
"silent": 1,
"eval_metric": "rmse",
}
num_round = 400
def custom_loss(preds, dtrain):
labels = dtrain.get_label()
df = preds - labels
df = pd.DataFrame(df, columns=['val'])
df['valg'] = df['val'].apply(lambda x: 5*abs(x) if x<0 else x)
grad = df['valg'].as_matrix()
return preds-labels, grad
dtrain = xgb.DMatrix(X_train.drop('price_act', axis=1),
label=X_train['price_act'])
dtest = xgb.DMatrix(X_test.drop('price_act',axis=1),
label=X_test['price_act'])
watchlist = [(dtrain,'train'), (dtest,'eval')]
bst = xgb.train(params, dtrain, num_round, watchlist, custom_loss)