Question

我试图使用以下命令在 test 模式下运行BERTSUM程序（https://github.com/nlpyang/PreSumm/tree/master/src）的提取摘要器：

python train.py -task ext -mode test -batch_size 3000 -test_batch_size 500 -bert_data_path C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\bert_data -log_file ../logs/val_abs_bert_cnndm -model_path C:\Users\hp\Downloads\bertext_cnndm_transformer -test_from C:\Users\hp\Downloads\bertext_cnndm_transformer\model_1.pt -sep_optim true -use_interval true -visible_gpus 1 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm

这是错误日志：

[2020-01-13 21:03:01,681 INFO] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin from cache at ../temp\aa1ef1aede4482d0dbcd4d52baad8ae300e60902e88fcb0bebdec09afd232066.36ca03ab34a1a5d5fa7bc3d03d55c4fa650fed07220e2eeebc06ce58d0e9a157
driver version : 10020
THCudaCheck FAIL file=..\aten\src\THC\THCGeneral.cpp line=50 error=100 : no CUDA-capable device is detected
Traceback (most recent call last):
  File "train.py", line 156, in <module>
    test_ext(args, device_id, cp, step)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\train_extractive.py", line 190, in test_ext
    model = ExtSummarizer(args, device, checkpoint)
  File "C:\Users\hp\Downloads\PreSumm-master\PreSumm-master\src\models\model_builder.py", line 168, in __init__
    self.to(device)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 426, in to
    return self._apply(convert)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 202, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 224, in _apply
    param_applied = fn(param)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 424, in convert
    return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
  File "C:\Users\hp\Anaconda3\lib\site-packages\torch\cuda\__init__.py", line 194, in _lazy_init
    torch._C._cuda_init()
RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at ..\aten\src\THC\THCGeneral.cpp:50

我确定自己具有支持CUDA的GPU，通过查看NVIDIA上的列表可以确定。这是NVIDIA GeForce GTX 950M。之前，我还使用GPU进行CUDA的深度学习项目。我已按照以下说明安装了CUDA和cudNN，认为可能是问题所在：https://www.easy-tensorflow.com/tf-tutorials/install/cuda-cudnn(latest版本，CUDA 10.2）。我还尝试在train.py中添加os.environ ['CUDA_VISIBLE_DEVICES'] ='0'（因为这适用于在线帮助页面上遇到相同错误的人们）。但是错误仍然存在。

如果有人可以帮助我解决这个问题，我将非常感激。

Answer 1

是否已检查您的设备（GPU）是否已启用。要了解该设备，请尝试运行-

nvidia-smi

RuntimeError：cuda运行时错误（100）：在.. \ aten \ src \ THC \ THCGeneral.cpp：50处未检测到具有CUDA功能的设备

1 个答案: