我正在由多个用户共享的Linux计算机上工作。该机器具有四个不同的GPU设备。我想在一个开放式GPU上同时运行Tensorflow网络和Tensorboard监视过程。目前,主GPU(GPU-0)已完全加载了其他一些用户的进程:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.46 Driver Version: 390.46 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:05:00.0 Off | N/A |
| 50% 84C P2 216W / 250W | 11124MiB / 11178MiB | 97% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:06:00.0 Off | N/A |
| 23% 33C P8 17W / 250W | 10845MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:09:00.0 Off | N/A |
| 23% 32C P8 16W / 250W | 10845MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:0A:00.0 Off | N/A |
| 23% 27C P8 16W / 250W | 10845MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
当我尝试使用以下命令运行tensorboard时:
tensorboard --logdir=path_to_directory
我收到CUDA_ERROR_OUT_OF_MEMORY错误。这显然是由于张量板试图在完全加载的GPU上运行。是否可以在其中一个开放的GPU上运行tensorboard?