这完全让我震惊。我一直在尝试在docker中执行GPU加速的应用程序,但通常会遇到丢失的libcuda.so.1错误。在进行故障排除时,我发现了这一点。
sudo nvidia-docker run --rm nvidia/cuda:9.0-devel nvidia-smi
给......
Tue Jun 19 14:21:16 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81 Driver Version: 384.81 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:41:00.0 Off | N/A |
| 0% 31C P0 26W / 250W | 0MiB / 11169MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
...但是如果我用相同的图像构建一个容器...
FROM nvidia/cuda:9.0-devel
RUN apt-get update && apt-get install -y python3 python3-dev python3-pip python3-cffi libcairo2-dev python-cairo python3-tk
RUN pip3 install cairocffi editdistance numpy scipy matplotlib keras tensorflow-gpu
ENTRYPOINT ["tail", "-f", "/dev/null"]
并尝试运行nvidia-smi,它不存在。
sudo nvidia-docker exec 5961ce38b1ef nvidia-smi
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown
在此示例中,我已检查以确保容器ID是正确的。如果我实际上进入容器并运行命令,也会发生同样的事情。
如何使tensorflow-gpu在容器中工作?