Tensorflow急切执行中的累积梯度

时间:2020-01-13 18:19:45

标签: python tensorflow deep-learning

我没有在线找到任何资源来累积TensorFlow急切执行中的梯度,所以我编写了类似这样的代码-

def process(model, dataloader, optimizer):
    accum_iter = 0
    accum_grads = [tf.zeros_like(x) for x in model.trainable_variables]

    for batch in dataloader:
        with tf.GradientTape() as tape:
            loss = compute_loss(batch, model)
        grads = tape.gradient(target=loss, sources=model.trainable_variables)

        accum_iter += 1
        if accum_iter % accum_steps == 0:
            accum_grads = [ag/accum_steps for ag in accum_grads]
            optimizer.apply_gradients(zip(accum_grads, model.trainable_variables))
            accum_iter = 0
        else:
            accum_grads = [g+ag for g, ag in zip(grads, accum_grads)]

批处理大小为4的梯度与批处理大小为2的两次积累的梯度有很大不同。我认为上面的代码中发生了错误。上面的代码有什么错误吗?

0 个答案:

没有答案