尝试为GPU编译xgboost
。好像我的Cuda安装坏了。
~$ cmake .. -DUSE_CUDA=ON
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find CUDA: Found unsuitable version "7.5", but required is at
least "8.0" (found /usr)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:386 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.5/Modules/FindCUDA.cmake:949 (find_package_handle_standard_args)
CMakeLists.txt:113 (find_package)
我最初安装了CUDA 7.5,但之后安装了CUDA 9.1。我试图卸载7.5,但可能错过了一些东西。我运行以下命令来检查我的Cuda版本。
~$ which nvcc
/usr/bin/nvcc
~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
~$ cat /usr/local/cuda/version.txt
CUDA Version 9.1.85
~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 390.30 Wed Jan 31 22:08:49 PST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.6)
~$ nvidia-smi
Wed Feb 21 00:35:35 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30 Driver Version: 390.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 25% 46C P2 56W / 250W | 487MiB / 11175MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
This question建议在/usr/bin
中清除cuda文件,并清除以下文件。
~$ ls /usr/local/cuda-9.1/bin
bin2c cuda-gdbserver nsight nvprof
computeprof cuda-install-samples-9.1.sh nsight_ee_plugins_manage.sh nvprune
crt cuda-memcheck nvcc nvvp
cudafe cuobjdump nvcc.profile ptxas
cudafe++ fatbinary nvdisasm uninstall_cuda_9.1.pl
cuda-gdb gpu-library-advisor nvlink
~$ cd /usr/bin
~$ ls /usr/local/cuda-9.1/bin | sudo xargs rm
rm: cannot remove 'computeprof': No such file or directory
rm: cannot remove 'crt': No such file or directory
rm: cannot remove 'gpu-library-advisor': No such file or directory
rm: cannot remove 'nsight': No such file or directory
rm: cannot remove 'nsight_ee_plugins_manage.sh': No such file or directory
rm: cannot remove 'nvcc.profile': No such file or directory
rm: cannot remove 'uninstall_cuda_9.1.pl': No such file or directory
在问题之后,我在~/.bashrc
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
经过这些更改后,系统正确引用了Cuda 9.1。其他诊断调用保持不变。
~$ which nvcc
/usr/local/cuda-9.1/bin/nvcc
~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
但是,运行cmake .. -DUSE_CUDA=ON
仍然失败,返回相同的错误。我尝试重新启动计算机,但没有帮助。
我怎样才能让它发挥作用?
答案 0 :(得分:1)
搞定了......
删除xgboost目录,从github重新克隆它,然后运行make。 make config中的一些残留文件堵塞了什么?