我正尝试将多GPU与Keras结合使用,如下所示:
from keras.utils import multi_gpu_model
parallel_model = multi_gpu_model(model, gpus = args.gpu_num)
但是,我收到此错误:
File "/usr/local/lib/python3.6/dist-packages/keras/utils/multi_gpu_utils.py", line 179, in multi_gpu_model
available_devices))
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.
运行nvidia-smi
确实向我展示了预期的GPU。
Wed Oct 14 22:42:27 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00001B8A:00:00.0 Off | 0 |
| N/A 37C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 0000E84D:00:00.0 Off | 0 |
| N/A 41C P0 41W / 300W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
但是,当尝试通过Keras / TF查找可见的GPU时,我看不到它们:
from keras import backend as K
print([x.name for x in K.get_session().list_devices()])
['/job:localhost/replica:0/task:0/device:CPU:0', '/job:localhost/replica:0/task:0/device:XLA_CPU:0']
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14468776653555675542
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 4031268204736357987
physical_device_desc: "device: XLA_CPU device"
]
关于如何解决此问题的任何线索吗?