Question

我只是在PyTorch上浏览beginner tutorial，并注意到在GPU 上放置张量（基本上与numpy数组相同）的许多不同方法之一花费了可疑的时间与其他方法相比：

import time
import torch

if torch.cuda.is_available():
    print('time =', time.time())
    x = torch.randn(4, 4)
    device = torch.device("cuda")
    print('time =', time.time())
    y = torch.ones_like(x, device=device)  # directly create a tensor on GPU  => 2.5 secs??
    print('time =', time.time())
    x = x.to(device)                       # or just use strings ``.to("cuda")``
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))       # ``.to`` can also change dtype together!
    a = torch.ones(5)
    print(a.cuda())
    print('time =', time.time())
else:
    print('I recommend you get CUDA to work, my good friend!')

输出（正时）：

time = 1551809363.28284
time = 1551809363.282943
time = 1551809365.7204516  # (!)
time = 1551809365.7236063

版本详细信息：

1 CUDA device: GeForce GTX 1050, driver version 415.27
CUDA          = 9.0.176
PyTorch       = 1.0.0
cuDNN         = 7401
Python        = 3.5.2
GCC           = 5.4.0
OS            = Linux Mint 18.3
Linux kernel  = 4.15.0-45-generic

您可以看到，此一项操作（“ y = ...”）花费的时间（2.5秒）比其余的总和（.003秒）要长得多。我对此感到困惑，因为我希望所有这些方法基本上都可以做到这一点。我已经尝试确保此行中的类型为32位或具有不同的形状，但是没有任何改变。

Answer 1

当我重新排序命令时，最上面的命令需要2.5秒。因此，这使我相信这里发生了设备的一次性设置，并且将来在GPU上的分配会更快。

为什么在PyTorch中在GPU上创建单个张量需要2.5秒？

1 个答案: