Question

我正在使用update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)函数在多个GPU中训练BatchNorm图层。在训练阶段，我们必须使用函数

收集moving_mean和moving_variance

with tf.device('/cpu:0'):
  update_ops=[]
  with tf.variable_scope(tf.get_variable_scope()):
     for i in range(self.conf.num_gpus):
        with tf.device('/gpu:%d' % i):
       with tf.name_scope('device_%d' % i):
          update_ops.extend(tf.get_collection(tf.GraphKeys.UPDATE_OPS))
  variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
  variables_averages_op = variable_averages.apply(tf.trainable_variables())
  with tf.control_dependencies(update_ops):
     self.train_op = tf.group(train_op_conv,variables_averages_op)

但是，我发现该功能的使用有一些方法

1. 在循环函数中 cifar10_main

with tf.device('/cpu:0'):
  with tf.variable_scope(tf.get_variable_scope()):
     for i in range(self.conf.num_gpus):
        with tf.device('/gpu:%d' % i):
       with tf.name_scope('device_%d' % i):
          #Igore the line update_ops
  variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
  variables_averages_op = variable_averages.apply(tf.trainable_variables())
  update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)  
  with tf.control_dependencies(update_ops):
     self.train_op = tf.group(train_op_conv,variables_averages_op)

2. 在循环函数之外 cifar10_multi_gpu

with tf.device('/cpu:0'):
  with tf.variable_scope(tf.get_variable_scope()):
     for i in range(self.conf.num_gpus):
        with tf.device('/gpu:%d' % i):
       with tf.name_scope('device_%d' % i):
          update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
  variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
  variables_averages_op = variable_averages.apply(tf.trainable_variables())            
  batchnorm_updates_op = tf.group(*update_ops)
  self.train_op = tf.group(train_op_conv, train_op_fc,variables_averages_op,batchnorm_updates_op)

3. 循环函数的内部和外部 inception v3，cifar10

<%= notice %>

正确的方法是什么？在我看来，这可能是第二种方式

Answer 1

您为第一种方法提供的代码段与cifar_10示例不匹配。 cifar10_main示例中采用的方法仅收集并应用源自第一个塔的更新操作，作为启发式代码优化。以下是相关的代码段：

  with tf.variable_scope('resnet', reuse=bool(i != 0)):
    with tf.name_scope('tower_%d' % i) as name_scope:
      with tf.device(device_setter):
        loss, gradvars, preds = _tower_fn(
            is_training, weight_decay, tower_features[i], tower_labels[i],
            data_format, params.num_layers, params.batch_norm_decay,
            params.batch_norm_epsilon)
        tower_losses.append(loss)
        tower_gradvars.append(gradvars)
        tower_preds.append(preds)
        if i == 0:
          # Only trigger batch_norm moving mean and variance update from
          # the 1st tower. Ideally, we should grab the updates from all
          # towers but these stats accumulate extremely fast so we can
          # ignore the other stats from the other towers without
          # significant detriment.
          update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS,
                                         name_scope)

请注意，在上面的代码段中，更新操作仅限于通过作为name_scope参数传递的tf.get_collection来自第一个塔的更新操作。

第二种方法适用于所有塔楼的所有更新操作。

第三种方法，正如您所写，是第一种方法的变体。但是，链接到inceptionv3文件实际上与cifar10_main示例类似。

关于哪种方法是正确的方法：它取决于。选择性地应用更新操作可以减少每个训练步骤的时间，同时牺牲（某些定义）正确性，而应用所有更新操作可以增加每个训练步骤的时间。在实践中，尝试两者并看看哪种权衡更适合你。

在多个GPU场景中收集批量规范化统计信息的正确方法？

1 个答案: