我正在尝试解决一个回归问题,其中我必须预测一台机器在出现故障时的状态,该机器将出现故障多长时间。目标是修复预计将在超过30天之内出现故障的 first 机器。
我想这个问题可能会受益于定制的损失函数 特别是,编写此自定义损失函数时要考虑的要点是:
要解决以上两点,我认为这可能是
的起点代码可能是这样的:
def custom_differentiate_long_short_objective(preds, dtrain):
labels = dtrain.get_label()
# create a dataframe with predictions and labels (easier to work)
dataframe = pandas.DataFrame({"predictions": preds, "labels": labels})
# differenciate between machines broken for > 30 days or less
dataframe_short = dataframe[dataframe["labels"] < 30]
dataframe_long = dataframe[dataframe["labels"] >= 30]
# for machines broken for >= 30 custom asymmetric MSE that gives 10 times more penalty when the true targets are more than predictions (underestimation) as compared to when true targets are less (overestimation)
residual_long = (dataframe_long["labels"] -
dataframe_long["predictions"]).astype("float")
grad_long = residual_long.copy()
grad_long[residual_long > 0] = -2 * 10.0 * grad_long[residual_long > 0]
grad_long[residual_long <= 0] = -2 * grad_long[residual_long <= 0]
hess_long = residual_long.copy()
hess_long[residual_long > 0] = 2 * 10.0
hess_long[residual_long <= 0] = 2.0
# machines broken for < 30 days custom asymmetric MSE that gives 10 times more penalty when the true targets are less than predictions (overestimation) as compared to when true targets are more (underestimation)
residual_short = (dataframe_short["labels"] - dataframe_short["predictions"]).astype("float")
grad_short = residual_short.copy()
grad_short[residual_short < 0] = -2 * 10.0 * grad_short[residual_short < 0]
grad_short[residual_short >= 0] = -2 * grad_short[residual_short >= 0]
hess_short = residual_short.copy()
hess_short[residual_short < 0] = 2 * 10.0
hess_short[residual_short >= 0] = 2.0
grad = pandas.concat([grad_short, grad_long]).values
hess = pandas.concat([hess_short, hess_long]).values
return grad, hess
我现在要添加的是使损失与距阈值的距离成正比。...对于y非常接近30,损失应该更高... 我被困在这里...有什么建议吗? 有道理吗?