用于构建xgboost的Cuda版本

时间:2018-02-21 07:32:26

标签: cmake cuda

尝试为GPU编译xgboost。好像我的Cuda安装坏了。

~$ cmake .. -DUSE_CUDA=ON
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find CUDA: Found unsuitable version "7.5", but required is at
  least "8.0" (found /usr)
Call Stack (most recent call first):
  /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:386 (_FPHSA_FAILURE_MESSAGE)
  /usr/share/cmake-3.5/Modules/FindCUDA.cmake:949 (find_package_handle_standard_args)
  CMakeLists.txt:113 (find_package)

我最初安装了CUDA 7.5,但之后安装了CUDA 9.1。我试图卸载7.5,但可能错过了一些东西。我运行以下命令来检查我的Cuda版本。

~$ which nvcc
/usr/bin/nvcc

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

~$ cat /usr/local/cuda/version.txt
CUDA Version 9.1.85

~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  390.30  Wed Jan 31 22:08:49 PST 2018
GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.6) 

~$ nvidia-smi
Wed Feb 21 00:35:35 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
| 25%   46C    P2    56W / 250W |    487MiB / 11175MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

This question建议在/usr/bin中清除cuda文件,并清除以下文件。

~$ ls /usr/local/cuda-9.1/bin
bin2c        cuda-gdbserver               nsight                       nvprof
computeprof  cuda-install-samples-9.1.sh  nsight_ee_plugins_manage.sh  nvprune
crt          cuda-memcheck                nvcc                         nvvp
cudafe       cuobjdump                    nvcc.profile                 ptxas
cudafe++     fatbinary                    nvdisasm                     uninstall_cuda_9.1.pl
cuda-gdb     gpu-library-advisor          nvlink

~$ cd /usr/bin
~$ ls /usr/local/cuda-9.1/bin | sudo xargs rm
rm: cannot remove 'computeprof': No such file or directory
rm: cannot remove 'crt': No such file or directory
rm: cannot remove 'gpu-library-advisor': No such file or directory
rm: cannot remove 'nsight': No such file or directory
rm: cannot remove 'nsight_ee_plugins_manage.sh': No such file or directory
rm: cannot remove 'nvcc.profile': No such file or directory
rm: cannot remove 'uninstall_cuda_9.1.pl': No such file or directory

在问题之后,我在~/.bashrc

中添加了新路径
export PATH=/usr/local/cuda-9.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.1/lib64\
                     ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

经过这些更改后,系统正确引用了Cuda 9.1。其他诊断调用保持不变。

~$ which nvcc
/usr/local/cuda-9.1/bin/nvcc

~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

但是,运行cmake .. -DUSE_CUDA=ON仍然失败,返回相同的错误。我尝试重新启动计算机,但没有帮助。

我怎样才能让它发挥作用?

1 个答案:

答案 0 :(得分:1)

搞定了......

删除xgboost目录,从github重新克隆它,然后运行make。 make config中的一些残留文件堵塞了什么?