Question

我希望我的模型在多个GPU上共享参数但使用不同批次的数据。

我可以使用model.fit()执行类似的操作吗？还有其他选择吗？

Answer 1

尝试使用make_parallel函数： https://github.com/kuza55/keras-extras/blob/master/utils/multi_gpu.py （它只适用于tensorflow后端）。

Answer 2

Keras现在使用keras.utils.multi_gpu_model在多个GPU上内置了对设备并行性的内置支持（截至v2.0.9）。

目前，仅支持Tensorflow后端。

这里的好例子（docs）：https://keras.io/getting-started/faq/#how-can-i-run-a-keras-model-on-multiple-gpus 此处还介绍了https://datascience.stackexchange.com/a/25737

Answer 3

在kera中，多GPU模型训练比以往任何时候都非常方便。检查以下文件：Multi-GPU and distributed training.

本质上，要使用 keras 模型进行单主机、多设备同步训练，您需要使用 tf.distribute.MirroredStrategy API。这是它的工作原理：

实例化一个 MirroredStrategy，可选择配置您要使用的特定设备（默认情况下，该策略将使用所有可用的 GPU）。
使用策略对象打开一个范围，并在此范围内创建您需要的所有包含变量的 Keras 对象。通常，这意味着在分发范围内创建和编译模型。
像往常一样通过 fit() 训练模型。

示意图如下：

# Create a MirroredStrategy.
strategy = tf.distribute.MirroredStrategy()
print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

# Open a strategy scope.
with strategy.scope():
  # Everything that creates variables should be under the strategy scope.
  # In general this is only model construction & `compile()`.
  model = Model(...)
  model.compile(...)

# Train the model on all available devices.
model.fit(train_dataset, validation_data=val_dataset, ...)

# Test the model on all available devices.
model.evaluate(test_dataset)

如何使用Keras进行多GPU培训？

3 个答案: