Question

我正尝试将多GPU与Keras结合使用，如下所示：

from keras.utils import multi_gpu_model

parallel_model = multi_gpu_model(model, gpus = args.gpu_num)

但是，我收到此错误：

  File "/usr/local/lib/python3.6/dist-packages/keras/utils/multi_gpu_utils.py", line 179, in multi_gpu_model
    available_devices))
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.

运行nvidia-smi确实向我展示了预期的GPU。

Wed Oct 14 22:42:27 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00001B8A:00:00.0 Off |                    0 |
| N/A   37C    P0    41W / 300W |      0MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 0000E84D:00:00.0 Off |                    0 |
| N/A   41C    P0    41W / 300W |      0MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

但是，当尝试通过Keras / TF查找可见的GPU时，我看不到它们：

from keras import backend as K
print([x.name for x in K.get_session().list_devices()])

['/job:localhost/replica:0/task:0/device:CPU:0', '/job:localhost/replica:0/task:0/device:XLA_CPU:0']

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14468776653555675542
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 4031268204736357987
physical_device_desc: "device: XLA_CPU device"
]

关于如何解决此问题的任何线索吗？

Keras 2.1.6（带有Tensorflow 1.12.0和CUDA 10.2）找不到GPU设备

0 个答案: