定制损失,残差和真实值的函数

时间:2019-01-23 11:44:36

标签: python regression xgboost loss-function

我正在尝试解决一个回归问题,其中我必须预测一台机器在出现故障时的状态,该机器将出现故障多长时间。目标是修复预计将在超过30天之内出现故障的 first 机器。

我想这个问题可能会受益于定制的损失函数 特别是,编写此自定义损失函数时要考虑的要点是:

  • 我们可以“ 放松”来预测实际中断时间为<30天的机器,因为机器将在超过== 30天的时间内保持故障状态
  • 不能预测机器的实际故障周期为> = 30天的机器,因为机器将在<30天之内保持故障状态

要解决以上两点,我认为这可能是

的起点
  • 对于y <30,超出估计值的地方会更多
  • 对于y> = 30,要惩罚更多的低估

代码可能是这样的:

def custom_differentiate_long_short_objective(preds, dtrain):
    labels = dtrain.get_label()
    # create a dataframe with predictions and labels (easier to work)
    dataframe = pandas.DataFrame({"predictions": preds, "labels": labels})

    # differenciate between machines broken for > 30 days or less
    dataframe_short = dataframe[dataframe["labels"] < 30]
    dataframe_long = dataframe[dataframe["labels"] >= 30]

    # for machines broken for >= 30 custom asymmetric MSE that gives 10 times more penalty when the true targets are more than predictions (underestimation) as compared to when true targets are less (overestimation)
    residual_long = (dataframe_long["labels"] - 
    dataframe_long["predictions"]).astype("float")


    grad_long = residual_long.copy()
    grad_long[residual_long > 0] = -2 * 10.0 * grad_long[residual_long > 0]
    grad_long[residual_long <= 0] = -2 * grad_long[residual_long <= 0]
    hess_long = residual_long.copy()
    hess_long[residual_long > 0] = 2 * 10.0
    hess_long[residual_long <= 0] = 2.0

    # machines broken for < 30 days custom asymmetric MSE that gives 10 times more penalty when the true targets are less than predictions (overestimation) as compared to when true targets are more (underestimation)
    residual_short = (dataframe_short["labels"] - dataframe_short["predictions"]).astype("float")
    grad_short = residual_short.copy()
    grad_short[residual_short < 0] = -2 * 10.0 * grad_short[residual_short < 0]
    grad_short[residual_short >= 0] = -2 * grad_short[residual_short >= 0]
    hess_short = residual_short.copy()
    hess_short[residual_short < 0] = 2 * 10.0
    hess_short[residual_short >= 0] = 2.0

    grad = pandas.concat([grad_short, grad_long]).values
    hess = pandas.concat([hess_short, hess_long]).values

return grad, hess

我现在要添加的是使损失与距阈值的距离成正比。...对于y非常接近30,损失应该更高... 我被困在这里...有什么建议吗? 有道理吗?

0 个答案:

没有答案