Question

我计划运行一个非常大的循环网络（例如2048x5），是否可以在张量流中的一个GPU上定义一个层？我应该如何实施该模型以实现最佳效率。据我所知，GPU间或GPU-GPU-GPU通信存在开销。

Answer 1

在TensorFlow中分割多个GPU的大型模型当然是可行的，但是最佳地做这个是一个棘手的研究问题。通常，您需要执行以下操作：

在with tf.device(...):块中包裹代码的大型连续区域，命名不同的GPU：

with tf.device("/gpu:0"):
  # Define first layer.

with tf.device("/gpu:1"):
  # Define second layer.

# Define other layers, etc.

构建优化程序时，将可选参数colocate_gradients_with_ops=True传递给optimizer.minimize()方法：

loss = ...
optimizer = tf.train.AdaGradOptimizer(0.01)
train_op = optimizer.minimize(loss, colocate_gradients_with_ops=True)

（可选。）您可能需要启用＆＃34;软展示位置＆＃34;在tf.ConfigProto创建tf.Session时，如果模型中的任何操作无法在GPU上运行：
```
config = tf.ConfigProto(allow_soft_placement=True)
sess = tf.Session(config=config)
```