Question

我正在尝试创建一个非常大的Keras模型，并将其分布在多个GPU上。需要明确的是，我并没有试图将相同模型的多个副本放置在多个GPU上。我正在尝试在多个GPU上建立一个大型模型。我一直在Keras中使用multi_gpu_model函数，但基于执行此操作时遇到的很多内存不足错误，似乎只是在复制模型，而不是像我想要的那样分发模型。

我调查了Horovod，但是因为我有很多Windows特定的日志记录工具正在运行，所以我不愿意使用它。

这似乎只剩下tf.estimators供我使用。从文档中还不清楚，但是我将如何使用这些估计器来完成我要尝试的操作。例如，哪种分配策略 tf.contrib.distribute可以让我按照我想要的方式有效地批处理模型吗？

我想与估计量做些什么，如果可以，我应该使用哪种策略？

Answer 1

您可以使用Estimator API。使用tf.keras.estimator.model_to_estimator

转换模型

session_config = tf.ConfigProto(allow_soft_placement=True)
distribute = tf.contrib.distribute.MirroredStrategy(num_gpus=4)
run_config = tf.estimator.RunConfig(train_distribute=distribute)
your_network = tf.keras.estimator.model_to_estimator(model_fn=your_keras_model, config=run_config)
your_network.train(input_fn)

别忘了编译模型

Answer 2

您可以使用TensorFlow后端将Keras模型的不同部分手动分配给不同的GPU。 This guide提供了详细的示例，this article说明了将Keras与TensorFlow结合使用。

import tensorflow as tf

with tf.device("/device:GPU:0"):
    #Create first part of your neural network

with tf.device("/device:GPU:1"):
    #Create second part of your neural network

#...

with tf.device("/device:GPU:n"):
    #Create nth part of your neural network

当心：CPU与多个GPU之间的通信延迟可能会增加训练的开销。

Answer 3

您需要设备并行性。 Keras常见问题解答中的This section提供了如何使用Keras进行操作的示例：

# Model where a shared LSTM is used to encode two different sequences in parallel
input_a = keras.Input(shape=(140, 256))
input_b = keras.Input(shape=(140, 256))

shared_lstm = keras.layers.LSTM(64)

# Process the first sequence on one GPU
with tf.device_scope('/gpu:0'):
    encoded_a = shared_lstm(tweet_a)
# Process the next sequence on another GPU
with tf.device_scope('/gpu:1'):
    encoded_b = shared_lstm(tweet_b)

# Concatenate results on CPU
with tf.device_scope('/cpu:0'):
    merged_vector = keras.layers.concatenate([encoded_a, encoded_b],
                                             axis=-1)

跨多个GPU分布Keras模型

3 个答案: