我没有在线找到任何资源来累积TensorFlow急切执行中的梯度,所以我编写了类似这样的代码-
def process(model, dataloader, optimizer):
accum_iter = 0
accum_grads = [tf.zeros_like(x) for x in model.trainable_variables]
for batch in dataloader:
with tf.GradientTape() as tape:
loss = compute_loss(batch, model)
grads = tape.gradient(target=loss, sources=model.trainable_variables)
accum_iter += 1
if accum_iter % accum_steps == 0:
accum_grads = [ag/accum_steps for ag in accum_grads]
optimizer.apply_gradients(zip(accum_grads, model.trainable_variables))
accum_iter = 0
else:
accum_grads = [g+ag for g, ag in zip(grads, accum_grads)]
批处理大小为4的梯度与批处理大小为2的两次积累的梯度有很大不同。我认为上面的代码中发生了错误。上面的代码有什么错误吗?