Question

我想在Jupyter笔记本中同时在多个GPU上训练多个模型。我正在使用4GPU的节点上工作。我想为一个模型分配一个GPU，并同时训练4个不同的模型。现在，我通过（例如）为一个笔记本选择GPU：

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

def model(...):
    ....

model.fit(...)

在四个不同的笔记本中。但是，拟合过程的结果和输出将分布在四个不同的笔记本中。但是，依次在一个笔记本中运行它们需要大量时间。如何将GPU分配给各个功能并并行运行？

Answer 1

我建议像这样使用Tensorflow范围：

with tf.device_scope('/gpu:0'):
  model1.fit()
with tf.device_scope('/gpu:1'):
  model2.fit()
with tf.device_scope('/gpu:2'):
  model3.fit()

Answer 2

如果您想在不同的云 GPU（例如来自 AWS 的 GPU 实例）上训练模型，请尝试使用此库：

!pip install aibro==0.0.45 --extra-index-url https://test.pypi.org/simple

from aibro.train import fit
machine_id = 'g4dn.4xlarge' #instance name on AWS
job_id, trained_model, history = fit(
    model=model,
    train_X=train_X,
    train_Y=train_Y,
    validation_data=(validation_X, validation_Y),
    machine_id=machine_id
)

教程：https://colab.research.google.com/drive/19sXZ4kbic681zqEsrl_CZfB5cegUwuIB#scrollTo=ERqoHEaamR1Y

同时在不同的GPU上训练多个keras / tensorflow模型

2 个答案: