TL; DR
在Google Colab上安装CUDA 10并将10.1替换为CUDA驱动程序的正确方法是什么?
更长:
最近,由于必须安装了CUDA 10.1,Google必须在collab上更新了一些驱动程序,并且我的项目要求哪个必需的tensorflow 1.14(1.15在导出模型时会出现问题)不再检测到GPU。
当我尝试现在在Collab上运行TD 1.14时,出现以下错误:
Nov 19, 2019, 9:51:52 AM WARNING 2019-11-19 14:51:52.917613: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
Nov 19, 2019, 9:51:52 AM WARNING 2019-11-19 14:51:52.917159: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
Nov 19, 2019, 9:51:52 AM WARNING 2019-11-19 14:51:52.916864: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
Nov 19, 2019, 9:51:52 AM WARNING 2019-11-19 14:51:52.916019: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-
nvidia:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
解决此问题的正确方法是什么?
我尝试过:
!ln -sf /usr/local/cuda/lib64/libcudart.so.10.1.243 /usr/local/cuda/lib64/libcudart.so.10.0
!ln -sf /usr/local/cuda/lib64/libcusparse.so.10.3.0.243 /usr/local/cuda/lib64/libcusparse.so.10.0
!ln -sf /usr/local/cuda/lib64/libcusolver.so.10.2.0.243 /usr/local/cuda/lib64/libcusolver.so.10.0
!ln -sf /usr/local/cuda/lib64/libcurand.so.10.1.1.243 /usr/local/cuda/lib64/libcurand.so.10.0
!ln -sf /usr/local/cuda/lib64/libcufft.so.10.1.1.243 /usr/local/cuda/lib64/libcufft.so.10.0
!apt-get --purge remove cuda nvidia* libnvidia-*
!dpkg -l | grep cuda- | awk '{print $2}' | xargs -n1 dpkg --purge
!apt-get remove cuda-*
!apt autoremove
!wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
!sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
!sudo apt-get update
!wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt install -y ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
!sudo apt-get update
# Install NVIDIA driver
#!sudo apt-get install --no-install-recommends nvidia-driver-418
!sudo apt-get -y installnvidia-driver-418
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
#!sudo apt-get install --no-install-recommends \
!sudo apt-get install -y \
cuda-10-0 \
libcudnn7=7.6.2.24-1+cuda10.0 \
libcudnn7-dev=7.6.2.24-1+cuda10.0
# Install TensorRT. Requires that libcudnn7 is installed above.
# !sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0 \
!sudo apt-get install -y libnvinfer5=5.1.5-1+cuda10.0 \
libnvinfer-dev=5.1.5-1+cuda10.0
!apt --fix-broken install
并更新LD_LIBRARY_PATH-但这使我无处可寻。
好奇切换到Cuda 10的正确方法是什么,以便我可以运行TF 1.14?
答案 0 :(得分:1)
我已解决此问题-简而言之,请使用--allow-change-held-packages,因为Google Colab拥有CUDA软件包。请参阅底部的完整说明:
有关完整解决方案,请参见上面的已编辑问题。