关于在Tensorflow塔结构中重用变量以实现多个GPU的代码理解

时间:2019-07-02 07:57:05

标签: python tensorflow

目标:我想用多个GPU训练一个Tensorflow模型,尤其是数据并行性。

我所遇到的困难:我阅读并尝试理解blogtensorflow source code

难度:但是,很难理解代码行

# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
在以下代码段中

    for i in xrange(FLAGS.num_gpus):
      with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (cifar10.TOWER_NAME, i)) as scope:
          # Calculate the loss for one tower of the CIFAR model. This function
          # constructs the entire CIFAR model but shares the variables across
          # all towers.
          loss = tower_loss(scope)

          # Reuse variables for the next tower.
          tf.get_variable_scope().reuse_variables() #TODO:hard to understand

          # Retain the summaries from the final tower.
          summaries = tf.get_collection(tf.GraphKeys.SUMMARIES, scope)

          # Calculate the gradients for the batch of data on this CIFAR tower.
          grads = opt.compute_gradients(loss)

          # Keep track of the gradients across all towers.
          tower_grads.append(grads)

您还可以在blogtensorflow source code中找到完整的代码。

对于不同的塔,创建的变量具有不同的作用域名称,即,不同的塔之间没有相同的变量。因此,我很困惑如何在一个塔中与其他塔共享这些变量,或者作者想要共享哪些变量?

任何帮助将不胜感激。

0 个答案:

没有答案