如何在 A40 GPU 上运行 PyTorch 没有错误(也有 DDP)?

时间:2021-05-22 02:20:46

标签: python pytorch

我尝试运行我的 pytorch 代码但出现此错误:

A40 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the A40 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Using backend: pytorch
/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/cuda/__init__.py:104: UserWarning: 
A40 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the A40 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 305, in <module>
    main_distributed()
  File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 201, in main_distributed
    mp.spawn(fn=train, args=(opts,), nprocs=opts.world_size)
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 210, in train
    setup_process(opts, rank, master_port=opts.master_port, world_size=opts.world_size)
  File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/torch/distributed.py", line 165, in setup_process
    dist.init_process_group(backend, rank=rank, world_size=world_size)
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607369981906/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8

但是它让我下载它用于我的 mac ......?这很奇怪。什么版本的 GPU A40 需要 pytorch、cuda、cudnn、nccl 和其他东西吗?

要查看我运行的代码和 conda env 信息,请查看:https://github.com/pytorch/pytorch/issues/58794


相关链接

0 个答案:

没有答案