我正在尝试使用mirroredStrategy在多个GPU上训练模型。我一直在关注以下链接: https://www.tensorflow.org/tutorials/distribute/custom_training
现在,我几乎完全遵循了该过程。但仍然出现以下错误:
INFO:tensorflow:Error reported to Coordinator: iterating over `tf.Tensor` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
我创建了这样的数据集:
dataset = tf.data.Dataset.from_tensor_slices(list_ds)
train_dataset = dataset.batch(GLOBAL_BATCH_SIZE, drop_remainder=True)
train_dist_dataset = strategy.experimental_distribute_dataset(train_dataset)
list_ds是字符串列表的列表。
这是我的训练循环的样子:
@tf.function
def distributed_train_step(dataset_inputs):
strategy.experimental_run_v2(train_step, args=dataset_inputs)
for epoch in range(EPOCHS):
for batch in train_dist_dataset:
distributed_train_step(batch)
我还运行了链接中提供的完全相同的代码,并且成功运行了,但是当我在模型上尝试相同的代码时,我得到了错误。