对于梯度增强决策树,我实现了一个自定义损失函数,它看起来像这样(并且可以工作):
def softmax(mat):
res = np.exp(mat)
res = np.multiply(res, 1/np.sum(res, axis=1, keepdims=True))
return res
def custom_asymmetric_objective(y_true, y_pred_encoded):
pred = y_pred_encoded.reshape((-1, 3), order='F')
pred = softmax(pred)
y_true = OneHotEncoder(sparse=False, categories='auto').fit_transform(y_true.reshape(-1, 1))
grad = (pred - y_true).astype("float")
hess = 2.0 * pred * (1.0-pred)
return grad.flatten('F'), hess.flatten('F')
现在,我想在目标函数中添加一些内容。它是通过使用现有数据帧然后添加一列来计算的,该列随后包含在损失函数中:
def custom_asymmetric_objective(y_true, y_pred_encoded):
pred = y_pred_encoded.reshape((-1, 3), order='F')
pred = softmax(pred)
y_true = OneHotEncoder(sparse=False, categories='auto').fit_transform(y_true.reshape(-1, 1))
#calculaten beta for each item in test data
df2 = df.drop(['h', 'b','Label','w'], axis=1)
betadf = df2.join(y_test, how = "right")
betadf['pred']=y_pred_encoded
overallmu = betadf['mu'].sum()
betadf['w'] = (betadf['mu']/overallmu)
label2value = {1: 0.11722, 2: 0.0124}
factors = betadf['pred'].map(lambda n: label2value.get(n, 0.003))
betadf['beta'] = betadf['w'] * (1 - ((betadf['sdL'] * factors) / betadf['muL']))
#calculate deviance between beta and the average beta for each item
average = 0.95/153
betadf['penalty'] = 0
betadf['penalty'].where(betadf['beta']-average > 0, average-betadf['beta'], inplace=True)
pen = betadf['penalty']
#get pen in same shape as y_true
pen = OneHotEncoder(sparse=False, categories='auto').fit_transform(pen.reshape(-1, 1))
grad = (pred - y_true + pen).astype("float")
hess = 2.0 * pred * (1.0-pred)
return grad.flatten('F'), hess.flatten('F')
如果运行该函数,则会收到错误“值的长度与索引的长度不匹配”。我分别检查了“笔”,一切正常。所以我不知道这个错误来自哪里