我正在将Chainer,Cupy用于 CUDA 8.0 。 我正在尝试使用 python3.5 脚本训练机器学习模型,但出现此错误:
cupy.cuda.runtime.CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable
我该怎么解决?
我尝试在其上训练深度学习模型的机器的环境详细信息,其中提供了有关nvidi-smi,echo CUDA_PATH,echo LD_LIBRARY_PATH的详细信息:
root@awsml04:~# nvidia-smi
Thu Mar 21 10:37:19 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:1E.0 Off | 0 |
| N/A 38C P0 24W / 300W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
检查CUDA路径
root@awsml04:~# echo $CUDA_PATH
/usr/local/cuda/bin:/usr/local/cuda-9.0
检查LD_LIBRARY_PATH:
root@awsml04:~# echo $LD_LIBRARY_PATH
/usr/local/cuda/lib64{LD_LIBRARY_PATH:+:/usr/local/cuda-9.0/lib64:/usr/local/cuda/lib64{LD_LIBRARY_PATH:+:/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:}}
检查环境| grep CUDA路径:
root@awsml04:~# env | grep CUDA
CUDA_PATH=/usr/local/cuda/bin:
LD_LIBRARY_PATH_WITH_DEFAULT_CUDA=/usr/lib64/openmpi/lib/:/usr/local/cuda/lib64:/usr/local/lib:/usr/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/mpi/lib:/lib/:/usr/local/cuda-9.0/lib/:
LD_LIBRARY_PATH_WITHOUT_CUDA=/usr/lib64/openmpi/lib/:/usr/local/lib:/usr/lib:/usr/local/mpi/lib:/lib/:
检查python3路径
root@awsml04:~# which python3
/usr/bin/python3
检查点数路径
root@awsml04:~# which pip3
/usr/bin/pip3
使用版本详细信息检查已安装的python库:
root@awsml04:~# pip3 freeze
absl-py==0.7.1
alabaster==0.7.12
alembic==1.0.8
appdirs==1.4.3
APScheduler==3.5.3
astor==0.7.1
astroid==2.1.0
awscli==1.16.76
Babel==2.6.0
backcall==0.1.0
beautifulsoup4==4.4.1
bleach==1.5.0
blinker==1.3
bokeh==1.0.3
boto==2.49.0
boto3==1.9.72
botocore==1.12.72
certifi==2018.11.29
chainer==5.3.0
chainerui==0.3.0
chardet==3.0.4
Click==7.0
cloud-init==18.5
cloudpickle==0.6.1
colorama==0.3.9
command-not-found==0.3
configobj==5.0.6
cpplint==1.3.0
cryptography==1.2.3
cycler==0.10.0
dask==1.0.0
decorator==4.3.0
defer==1.0.6
defusedxml==0.5.0
docutils==0.14
easydict==1.9
entrypoints==0.2.3
enum34==1.1.6
environment-kernels==1.1.1
fastrlock==0.4
filelock==2.0.13
Flask==1.0.2
future==0.17.1
gast==0.2.2
glog==0.3.1
graphviz==0.10.1
grpcio==1.19.0
h5py==2.7.1
hibagent==1.0.1
html5lib==0.9999999
idna==2.8
imagesize==1.1.0
ipykernel==5.1.0
ipyparallel==6.2.3
ipython==7.2.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
isort==4.3.4
itsdangerous==1.1.0
jedi==0.13.2
Jinja2==2.10
jmespath==0.9.3
jsonpatch==1.10
jsonpointer==1.9
jsonschema==2.6.0
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.4.0
Keras==2.2.4
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
kiwisolver==1.0.1
language-selector==0.1
lazy-object-proxy==1.3.1
lxml==3.5.0
Mako==1.0.7
Markdown==2.6.10
MarkupSafe==1.1.0
matplotlib==3.0.2
mccabe==0.6.1
mistune==0.8.4
mock==2.0.0
msgpack==0.6.1
nbconvert==5.4.0
nbformat==4.4.0
networkx==2.2
nose==1.3.7
notebook==5.7.4
numpy==1.15.1
oauthlib==1.0.3
olefile==0.44
opencv-python==3.4.1.15
packaging==18.0
pandas==0.23.4
pandocfilters==1.4.2
parso==0.3.1
pbr==5.1.3
pexpect==4.6.0
pickleshare==0.7.5
Pillow==4.3.0
prettytable==0.7.2
prometheus-client==0.5.0
prompt-toolkit==2.0.7
protobuf==3.7.0
ptyprocess==0.6.0
pyasn1==0.4.5
pycups==1.9.73
pycurl==7.43.0
pydot==1.4.1
pygal==2.4.0
Pygments==2.3.1
pygobject==3.20.0
PyJWT==1.3.0
pylint==2.2.2
pyparsing==2.2.0
pyserial==3.0.1
python-apt==1.1.0b1+ubuntu0.16.4.2
python-dateutil==2.6.1
python-debian==0.1.27
python-editor==1.0.4
python-gflags==3.1.2
python-systemd==231
pytz==2017.3
PyWavelets==1.0.1
pyxdg==0.25
PyYAML==3.13
pyzmq==17.1.2
qtconsole==4.4.3
requests==2.21.0
roman==2.0.0
rsa==3.4.2
s3transfer==0.1.13
scikit-image==0.14.1
scikit-learn==0.20.2
scipy==1.2.0
screen-resolution-extra==0.0.0
seaborn==0.9.0
Send2Trash==1.5.0
six==1.12.0
snowballstemmer==1.2.1
Sphinx==1.8.3
sphinx-rtd-theme==0.1.9
sphinxcontrib-websupport==1.1.0
SQLAlchemy==1.3.1
ssh-import-id==5.5
system-service==0.3
tensorboard==1.12.2
tensorflow==1.12.0
tensorflow-estimator==1.13.0
tensorflow-gpu==1.12.0
tensorflow-tensorboard==0.4.0rc3
termcolor==1.1.0
terminado==0.8.1
testpath==0.4.2
toolz==0.9.0
tornado==5.1.1
tqdm==4.19.5
traitlets==4.3.2
typed-ast==1.1.1
tzlocal==1.5.1
ufw==0.35
unattended-upgrades==0.1
urllib3==1.24.1
virtualenv==15.0.1
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.13
widgetsnbextension==3.4.2
wrapt==1.10.11
xkit==0.0.0
链接器CUDA信息:
root@awsml04:~# python3 -c "import chainer; print(chainer.print_runtime_info())"
/usr/lib/python3.5/site-packages/chainer/backends/cuda.py:98: UserWarning: cuDNN is not enabled.
Please reinstall CuPy after you install cudnn
(see https://docs-cupy.chainer.org/en/stable/install.html#install-cudnn).
'cuDNN is not enabled.\n'
/usr/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Platform: Linux-4.4.0-1077-aws-x86_64-with-Ubuntu-16.04-xenial
Chainer: 5.3.0
NumPy: 1.15.1
CuPy:
CuPy Version : 5.3.0
CUDA Root : /usr/local/cuda/bin:/usr/local/cuda-9.0
CUDA Build Version : 9000
CUDA Driver Version : 9000
CUDA Runtime Version : 9000
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : 2307
NCCL Runtime Version : 2307
iDeep: Not Available
None
root@awsml04:~# python3 -c "import cupy; print(cupy.empty((3, 3)))"
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
完整错误回溯:
stacktrace.py
Exception in main training loop: cudaErrorNoDevice: no CUDA-capable
device is detected Traceback (most recent call last):
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/root/.see-master/lib/python3.5/site-packages/chainer/reporter.py", line 98, in scope
yield
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
update()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
self.update_core()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 195, in update_core
self.setup_workers()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 186, in setup_workers
with cuda.Device(self._devices[0]): File "cupy/cuda/device.pyx", line 106, in cupy.cuda.device.Device.__enter__
File "cupy/cuda/runtime.pyx", line 164, in cupy.cuda.runtime.getDevice
File "cupy/cuda/runtime.pyx", line 136, in
cupy.cuda.runtime.check_status Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "chainer/train_svhn.py", line 258, in <module>
trainer.run()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 313, in run
six.reraise(*sys.exc_info())
File "/usr/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 302, in run
entry.extension(self)
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/root/.see-master/lib/python3.5/site-packages/chainer/reporter.py", line 98, in scope
yield
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/trainer.py", line 299, in run
update()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updater.py", line 223, in update
self.update_core()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 195, in update_core
self.setup_workers()
File "/root/.see-master/lib/python3.5/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py", line 186, in setup_workers
with cuda.Device(self._devices[0]): File "cupy/cuda/device.pyx", line 106, in cupy.cuda.device.Device.__enter__
File "cupy/cuda/runtime.pyx", line 164, in cupy.cuda.runtime.getDevice
File "cupy/cuda/runtime.pyx", line 136, in cupy.cuda.runtime.check_status
cupy.cuda.runtime.CUDARuntimeError: cudaErrorNoDevice: no CUDA-capable device is detected
答案 0 :(得分:1)
没有足够的信息来猜测错误的原因,但我只是建议您执行某事。
重要提示:在执行以下所有操作之前,请勿注销,分离或关闭外壳程序。
$ export CUDA_PATH=/usr/local/cuda-9.0
$ export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64
$ pip3 uninstall -y chainer cupy cupy-cuda80 cupy-cuda90 cupy-cuda92
$ pip3 install cupy-cuda90 --no-cache-dir && pip3 install chainer --no-cache-dir
$ git clone https://github.com/chainer/chainer.git && cd chainer && git checkout v5.3.0
$ python3 examples/mnist/train_mnist.py --gpu 0
如果这可行,请随后再次尝试运行脚本。