我有一台配备8个GPU的机器(4个GPU GTX 1080 Ti的11 Gb de RAM和4个RTX 1080),并且无法获得张量流以正确(或完全)使用它们。
当我这样做
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
它打印
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 5295519098812813462
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12186007115805339517
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:1"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 17706271046686153881
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:2"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 14710290295129432533
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:3"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 1381213064943868400
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:4"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 12093982778662340719
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:5"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 682960671898108683
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:6"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 9901240111105546679
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_GPU:7"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 8442134369143872649
physical_device_desc: "device: XLA_GPU device"
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 1687638086072792879
physical_device_desc: "device: XLA_CPU device"
].
如果我尝试将GPU用于任何事物,则nvidia-smi表示它们已被占用,但以0%的速度运行,并且任务的速度表明tensorflow只是在使用CPU。
在其他具有相同设置的机器上,它也会同时打印'/device:GPU:2'
和'/device:XLA_GPU:2'
(例如),并且tensorflow可以毫无问题地使用它们。
我已经看到了类似的问题和解决方案,但似乎没有任何作用。
答案 0 :(得分:1)
最有可能您安装了不兼容的CUDA。如果您使用pip安装tensorflow,请检查https://www.tensorflow.org/install/gpu以查看tensorflow版本和相应的CUDA版本(也为cudnn版本)。确保您已安装correct
版的tensorflow,CUDA和cudnn。或者,您可以选择从源代码构建张量流,但我对此经验不足,可以自己使用它进行搜索:)祝您好运!