我已经在我的VM上安装了所有要求的软件包,但没有安装nvidia GPU驱动程序。在要求中没有nvidia GPU驱动程序安装说明,我想知道哪个cuda版本及其兼容的nvidia驱动程序需要哪个也可以解决以下错误。
Github链接:github
错误日志:
File "run_ner.py", line 594, in <module>
main()
File "run_ner.py", line 489, in main
loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "run_ner.py", line 35, in forward
valid_output = torch.zeros(batch_size,max_len,feat_dim,dtype=torch.float32,device='cuda')
File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/cuda/__init__.py", line 178, in _lazy_init
_check_driver()
File "/home/pt3_gcp/BERT-NER/ber_ner/lib/python3.7/site-packages/torch/cuda/__init__.py", line 99, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
**Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx
**
通过以下链接安装最新的cuda版本后, cuda我遇到以下错误,
06/04/2020 07:38:40 - INFO - __main__ - ***** Running training *****
06/04/2020 07:38:40 - INFO - __main__ - Num examples = 14041
06/04/2020 07:38:40 - INFO - __main__ - Batch size = 32
06/04/2020 07:38:40 - INFO - __main__ - Num steps = 2190
Epoch: 0%| | 0/5 [00:00<?, ?it/sTHCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=50 error=38 : no CUDA-capable device is detectedt/s]
Traceback (most recent call last):
File "run_ner.py", line 594, in <module>
main()
File "run_ner.py", line 489, in main
loss = model(input_ids, segment_ids, input_mask, label_ids,valid_ids,l_mask)
File "/home/pt3_gcp/.local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in __call__
result = self.forward(*input, **kwargs)
File "run_ner.py", line 35, in forward
valid_output = torch.zeros(batch_size,max_len,feat_dim,dtype=torch.float32,device='cuda')
File "/home/pt3_gcp/.local/lib/python3.7/site-packages/torch/cuda/__init__.py", line 179, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50
答案 0 :(得分:0)
我前段时间遇到了同样的问题。以下命令为我修复!
如果您有多个安装,这是一个问题,并且由于您尝试了很多东西,现在您可能已经安装了。基本上删除所有内容
sudo apt-get purge nvidia-*
sudo apt-get remove nvidia-cuda-toolkit
sudo apt autoremove --purge cuda-10-0 // you might have a different version, check it git cuda --version
同时删除用户群中的现有文件
rm -rf /usr/local/cuda* // anything related to cuda
rm -rf /usr/local/nvidia* // anything related to nvidia
现在,终于重新安装
sudo apt-get update // update your packages
sudo apt search nvidia-driver // to get the latest version of the driver. After finding out the latest version, install it with
sudo apt install nvidia-driver-450 (or any other number, depending on the latest version)
安装后必须重启!
sudo reboot
当您回来时,nvidia-smi
和您的 gpu 应该可以工作