TensorFlow版本2.0.0
我正在Tensorflow 2.0.0中训练2个层叠模型model1和model2,以便在2台不同的计算机中使用。目前,我正在同一台计算机上模拟它。如果我在相同的梯度带中训练模型,那么它可以完美地训练,将梯度计算为:
import tensorflow as tf
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
with tf.GradientTape(persistent=True) as tape:
features = model1(x_batch_train)
logits = model2(features)
loss_value = loss_fn(y_batch_train, logits)
# Computing gradients for model 2
grads2 = tape.gradient(loss_value, model2.trainable_variables)
# Computing gradients to pass to model 1
grads_pass = tape.gradient(loss_value, features)
# Computing gradients for model 1
grads1 = tape.gradient(features, model1.trainable_variables, output_gradients=grads_pass)
但是,如果我想将训练分为2台不同的计算机,则每台计算机上都应该有一个梯度带,但是计算出的结果是不同的!
import tensorflow as tf
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
with tf.GradientTape() as tape1:
features = model1(x_batch_train)
with tf.GradientTape(persistent=True) as tape2:
logits = model2(features)
loss_value = loss_fn(y_batch_train, logits)
# Computing gradients for model 2
grads2 = tape2.gradient(loss_value, model2.trainable_variables)
# Computing gradients to pass to model 1
grads_pass = tape2.gradient(loss_value, features)
# Computing gradients for model 1
grads1 = tape1.gradient(features, model1.trainable_variables, output_gradients=grads_pass)
磁带上记录的操作是独立的,为什么会发生这种情况?