Question

我正在使用Tensorflow v2-rc0来构建和训练神经网络。我的损失函数相当复杂，无法用张量运算表示。由于这个原因，我想在急切模式下运行损失函数，但为所有与模型相关的计算构造一个图。为了计算训练的梯度，我需要分别计算网络的雅可比行列式和损失函数，并取其点积。

这是我实现梯度计算的方式。这只是最终代码的模拟。我在这里使用numpy来模拟Loss函数中急切执行的需求。

def mse(yTrue, yPred):
    return np.mean((yTrue - yPred)**2)

def msejac(yTrue, yPred):
    yt = np.array(yTrue)
    yp = np.array(yPred)
    return -2.0/len(yt) * (yt - yp)

@tf.function
def jacobian(model, inputs):
    with tf.GradientTape() as gtape:
        outputs = model(inputs)
    jacs = gtape.jacobian(outputs, model.trainable_variables)
    return outputs, jacs

def grad(model, inputs, targets):
    outputs, outjacs = jacobian(model, inputs)
    loss_value = mse(outputs, targets)

    grads = [
        np.mean([tf.tensordot(msejac(targets[b, :], outputs[b, :]),
                              outjac[b, :],
                              [-1, 0]).numpy()
                 for b in range(inputs.shape[0])], axis=0)  # iter over batches
        for outjac in outjacs]   # iter over model variables

    return loss_value, grads

在grad末尾，模型Jacobian模型和点积的计算到目前为止是我训练周期中最长的。并且比相同的计算要慢几个数量级，但直接根据损耗计算梯度，即

def tf_mse(yTrue, yPred):
    return tf.math.reduce_mean((yTrue - yPred)**2)

@tf.function
def tf_grad(model, inputs, targets):
    with tf.GradientTape() as gtape:
        outputs = model(inputs)
        lossValue = tf_mse(outputs, targets)
    return lossValue, gtape.gradient(lossValue, model.trainable_variables)

当减少批处理大小时，由于Jacobian变得更小，所以与训练循环的其余部分相比，我的代码变得更有效率。但是，即使对于1的批处理大小，也无法达到纯基于TF的实现的性能。另外，我需要具有较大的批处理大小才能使损失函数有意义。

有什么办法可以加快速度吗？在这种情况下，计算雅可比行列式甚至是一件理智的事情，还是有一种更好的方法来获得梯度？

在Tensorflow中通过急切模式下的损失功能加速自定义训练中的梯度计算

0 个答案: