使用mpirun -np X和tensorflow:X是否受到GPU数量的限制?

时间:2017-09-25 00:53:21

标签: python tensorflow mpi gpu

我正在尝试使用TensorFlow的mpi。有关此类代码的示例,请see this OpenAI baselines PPO code。它告诉我们运行以下命令:

$ mpirun -np 8 python -m baselines.ppo1.run_atari

我有一台带有一个GPU(内存为12GB)和安装了Tensorflow 1.3.0的机器,使用的是Python 3.5.3。当我运行此代码时,我收到以下错误:

2017-09-24 17:29:12.975967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.90GiB
Free memory: 11.17GiB
2017-09-24 17:29:12.975990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-24 17:29:12.975996: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-24 17:29:12.976011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
2017-09-24 17:29:12.987133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.90GiB
Free memory: 11.17GiB
2017-09-24 17:29:12.987159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-24 17:29:12.987165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-24 17:29:12.987172: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
[2017-09-24 17:29:12,994] Making new env: PongNoFrameskip-v4
2017-09-24 17:29:13.017845: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-09-24 17:29:13.022347: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: TITAN X (Pascal)
major: 6 minor: 1 memoryClockRate (GHz) 1.531
pciBusID 0000:01:00.0
Total memory: 11.90GiB
Free memory: 104.81MiB
2017-09-24 17:29:13.022394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-24 17:29:13.022415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-24 17:29:13.022933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0)
2017-09-24 17:29:13.026338: E tensorflow/stream_executor/cuda/cuda_driver.cc:924] failed to allocate 104.81M (109903872 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

(这只是错误信息的第一部分;它非常长,但我认为这个开头部分是重要的事情。)

但是,如果我使用mpirun -np 1运行,则此命令有效。

我在线搜索,发现repository from Uber表示“要在拥有4个GPU的计算机上运行”,我需要使用:

$ mpirun -np 4 python train.py

我只想确认mpirun -np X表示X受到计算机上GPU数量的限制,假设我们正在运行的是TensorFlow程序。

1 个答案:

答案 0 :(得分:0)

在阅读了有关MPI的更多信息之后,我可以确认是的,确实进程数量受GPU数量的限制。理由:

  • int[] numbers = {0,1,2,3,4,5,6,7,8,9,10}; public void traverseReversed(int[] a) { traverseReversed(a, 0); } private void traverseReversed(int[] a, int i) { if ( i + 1 < a.length ) { // Traverse the rest of the array first. traverseReversed(a, i+1); } System.out.println(a[i]); } public void test() throws Exception { System.out.println("Hello world!"); traverseReversed(numbers); } 命令将运行X&#34;副本&#34;代码(但每个都有自己的排名)。 See the documentation here
  • 每次运行代码都需要GPU
  • TensorFlow只允许一个程序一次使用一个GPU。换句话说,您无法同时运行mpirun -np Xpython tf_program1.py,而他们都使用TensorFlow并且需要在您的计算机上使用单独的GPU。

因此,我似乎被迫使用一个流程。