Question

界，我对Tensorflow cifar10示例的多GPU训练中的学习速率衰减提出了一个小问题。

以下是代码：

# Create a variable to count the number of train() calls. This equals the
# number of batches processed * FLAGS.num_gpus.
global_step = tf.get_variable(
    'global_step', [],
    initializer=tf.constant_initializer(0), trainable=False)

# Calculate the learning rate schedule.
num_batches_per_epoch = (cifar10.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN /
                         FLAGS.batch_size)
decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY)
# Decay the learning rate exponentially based on the number of steps.
lr = tf.train.exponential_decay(cifar10.INITIAL_LEARNING_RATE,
                                global_step,
                                decay_steps,
                                cifar10.LEARNING_RATE_DECAY_FACTOR,
                                staircase=True)

在此代码中，不考虑gpus的数量。例如，如果我们将FLAGS.num_gpus增加到4. decay_steps不会改变。

在注释中，global_step应该等于处理的批次数* FlAGS.num_gpus。但是，global_step仅在调用opt.apply_gradients（）函数时增加。它每次迭代只增加1步。

在我看来，代码应该是

decay_steps = int(num_batches_per_epoch * cifar10.NUM_EPOCHS_PER_DECAY/FLAGS.num_gpus)

因此，当使用多个GPU时，减少了经历1个纪元所需的迭代次数。

请纠正我并帮助我理解我的逻辑是否正确。

tensorflow cifar10示例使用多个gpus时学习速率衰减混淆

0 个答案: