Question

我正在做一些机器学习，我必须处理自定义损失函数。损失函数的导数和Hessian很难导出，因此我求助于使用Tensorflow自动计算它们。

这里是一个例子。

import numpy as np
import tensorflow as tf

y_true = np.array([
    [1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1]
], dtype=float)

y_pred = np.array([
    [1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0],
    [0, 0, 0, 1, 0],
    [0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1],
    [0, 0, 0, 0, 1]
], dtype=float)

weights = np.array([1, 1, 1, 1, 1], dtype=float)

with tf.Session():

    # We first convert the numpy arrays to Tensorflow tensors
    y_true = tf.convert_to_tensor(y_true)
    y_pred = tf.convert_to_tensor(y_pred)
    weights = tf.convert_to_tensor(weights)

    # The following code block is a custom loss 
    ys = tf.reduce_sum(y_true, axis=0)
    y_true = y_true / ys
    ln_p = tf.nn.log_softmax(y_pred)
    wll = tf.reduce_sum(y_true * ln_p, axis=0)
    loss = -tf.tensordot(weights, wll, axes=1)

    grad = tf.gradients(loss, y_pred)[0]

    hess = tf.hessians(loss, y_pred)[0]
    hess = tf.diag_part(hess)

    print(hess.eval())

打印出来的

[[0.24090069 0.12669198 0.12669198 0.12669198 0.12669198]
 [0.12669198 0.24090069 0.12669198 0.12669198 0.12669198]
 [0.12669198 0.12669198 0.12669198 0.24090069 0.12669198]
 [0.12669198 0.12669198 0.24090069 0.12669198 0.12669198]
 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]
 [0.04223066 0.04223066 0.04223066 0.04223066 0.08030023]]

我对此感到满意，因为它可以工作，问题是它无法扩展。对于我的用例，我只需要Hessian矩阵的对角线。我设法使用hess = tf.diag_part(hess)提取了它，但这仍然可以计算出完整的Hessian，这是不必要的。开销如此之大，以至于我无法将其用于中等大小的数据集（约10万行）。

因此，我的问题是：有没有更好的方法来提取黑森州的对角线？我对post和this one很清楚，但是我觉得答案不够好。

带有Tensorflow的黑森州对角线

0 个答案: