我通过Python为h2o中的二进制(0/1)分类问题创建了一个自定义损失函数,如下所示。这个想法是基于真实肯定,真实否定,错误肯定和错误否定最小化总成本。以下是我希望得到解答的问题:
f1
阈值custom_metric_func
上可以在GLM,DRF和GBM中使用。但是,尽管它在GBM中可以完美运行,但它在GLM中不起作用(损失函数值默认为0)。知道为什么会这样吗?自定义损失功能:
class CustomLossFunc:
def map(self, predicted, actual, weight, offset, model):
import math
cost_tp = -9
cost_tn = 0
cost_fp = 1
cost_fn = 10
y = actual[0]
y_pred = predicted[0] # [class, p0, p1]
if (y == 0) and (y_pred == 0):
total_cost = cost_tn
elif (y == 0) and (y_pred == 1):
total_cost = cost_fp
elif (y == 1) and (y_pred == 1):
total_cost = cost_tp
else:
total_cost = cost_fn
return [total_cost, 1]
def reduce(self, left, right):
return [left[0] + right[0], left[1] + right[1]]
def metric(self, last):
return last[0]
使用h2o.upload_custom_metric()
上传损失函数,然后运行GLM和GBM进行比较:
# GLM
glm_fit_cost = H2OGeneralizedLinearEstimator(family='binomial',
model_id='glm_fit_cost',
#standardize=True,
custom_metric_func= cost_loss_func)
glm_fit_cost.train(x=x_co,
y=y_co,
training_frame = train_co_h2o,
validation_frame = valid_co_h2o)
# GBM
gbm_mod = H2OGradientBoostingEstimator(model_id = "gbm_mod",
custom_metric_func = cost_loss_func)
gbm_mod.train(y=y_co,
x=x_co,
training_frame=train_co_h2o,
validation_frame = valid_co_h2o)
我尝试了以下方法:
用于创建我自己的损失函数的示例的参考: