Question

我这样做：

t = Variable(torch.randn(5))
t =t.cuda()
print(t)

但每次都需要5到10个小时。我用cuda样本测试带宽，没关系。然后我用pdb找到哪个花费的时间最多。我在/anaconda3/lib/python3.6/site-packages/torch/cuda/__init__找到了

def _lazy_new(cls, *args, **kwargs):
    _lazy_init()
    # We need this method only for lazy init, so we can remove it
    del _CudaBase.__new__
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)

return大约需要5个小时我不知道如何通过这些信息来解决我的问题。我的环境是：Ubuntu 16.04 + CUDA 9.1

Answer 1

有一个cuda版本不匹配的cuda我的pytorch编译与cuda我正在运行。我划分官方安装commond

conda install pytorch torchvision cuda90 -c pytorch

分为两部分：

conda install -c soumith magma-cuda90

conda install pytorch torchvision -c soumith

默认情况下，第二个commond安装了pytorch-0.2.0，这是数学CUDA8.0。在我将我的pytorch更新到0.3.0之后，这个commond只需要一秒钟。

Answer 2

尝试这样做：

torch.cuda.synchronize()
t = Variable(torch.randn(5))
t =t.cuda()
print(t)

然后，它应该快速取决于您的GPU内存，至少在每次重新运行它应该是。

每次我使用cuda（）在pytorch中将变量从CPU移到GPU，大约需要5到10个小时

2 个答案: