RuntimeError:cuda运行时错误(100):在.. \ aten \ src \ THC \ THCGeneral.cpp:50处未检测到具有CUDA功能的设备

时间:2020-01-13 15:42:16

标签: deep-learning gpu pytorch bert-language-model

我试图使用以下命令在 test 模式下运行BERTSUM程序(https://github.com/nlpyang/PreSumm/tree/master/src)的提取摘要器:

python train.py -task ext -mode test -batch_size 3000 -test_batch_size 500 -bert_data_path C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\bert_data -log_file ../logs/val_abs_bert_cnndm -model_path C:\Users\hp\Downloads\bertext_cnndm_transformer -test_from C:\Users\hp\Downloads\bertext_cnndm_transformer\model_1.pt -sep_optim true -use_interval true -visible_gpus 1 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm

这是错误日志:

[2020-01-13 21:03:01,681 INFO] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at ../temp\aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
driver version : 10020
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "train.py", line 156, in <module>
    test_ext(args, device_id, cp, step)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\train_extractive.py", line 190, in test_ext
    model = ExtSummarizer(args, device, checkpoint)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\models\model_builder.py", line 168, in __init__
    self.to(device)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 426, in to
    return self._apply(convert)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 224, in _apply
    param_applied = fn(param)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 424, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 194, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:50 

我确定自己具有支持CUDA的GPU,通过查看NVIDIA上的列表可以确定。这是NVIDIA GeForce GTX 950M。之前,我还使用GPU进行CUDA的深度学习项目。 我已按照以下说明安装了CUDA和cudNN,认为可能是问题所在:https://www.easy-tensorflow.com/tf-tutorials/install/cuda-cudnn(latest版本,CUDA 10.2)。我还尝试在train.py中添加os.environ ['CUDA_VISIBLE_DEVICES'] ='0'(因为这适用于在线帮助页面上遇到相同错误的人们)。但是错误仍然存​​在。

如果有人可以帮助我解决这个问题,我将非常感激。

1 个答案:

答案 0 :(得分:0)

是否已检查您的设备(GPU)是否已启用。 要了解该设备,请尝试运行-

nvidia-smi