Question

我建立了语义分割Keras（tensorflow后端）模型，并尝试在Google Cloud ml引擎上对其进行训练。我有大约200,000张（256x256）图像要以小批量（10）进行训练，大约需要100个纪元。当我仅使用类型为complex_model_m_gpu的主设备时，一个纪元就花费了将近25个小时。

我不确定Keras模型如何适应多GPU训练设备（例如complex_model_m_gpu）。没有与此相关的文档，仅涉及分布式TensorFlow培训。如何最好地利用ML引擎上的可用资源来快速训练模型？使用多名工人如何影响培训过程。当我将工作程序添加到堆栈中时，它表明主服务器和工作程序都在相互独立地执行1个纪元，并且它们都保存了不同的检查点。这似乎适得其反。

Answer 1

使用1个以上的GPU会对代码进行一些修改。 Here's one tutorial，可能会对您有所帮助。请注意以下代码行：

# we'll store a copy of the model on *every* GPU and then combine
# the results from the gradient updates on the CPU
with tf.device("/cpu:0"):
    # initialize the model
    model = MiniGoogLeNet.build(width=32, height=32, depth=3,
        classes=10)

# make the model parallel
model = multi_gpu_model(model, gpus=G)

使用一台具有1/2/4/8 GPU的计算机而不是多台计算机通常具有更高的性能。但是，如果要扩展到单台计算机上的GPU数量以外，请model_to_estimator并在生成的train_and_evaluate上调用Estimator。 Keras不了解多机器，因此，如您所观察到的那样，如果您不这样做，每个工作人员将尝试独立运行。

多人在ML引擎上进行Keras模型训练

1 个答案: