Pytorch中的Titan XP与Quadro P400 GPU

时间:2018-03-12 12:45:50

标签: performance time cuda gpu pytorch

我试用了我的机器上的两个GPU,我希望Titan-XP比Quadro-P400更快。但是,两者的执行时间几乎相同。

我需要知道PyTorch是否会动态地选择一个GPU而不是另一个,或者,我自己将必须在运行时指定PyTorch将使用哪一个。

以下是测试中使用的代码段:

import torch
import time

def do_something(gpu_device):
    torch.cuda.set_device(gpu_device)  # torch.cuda.set_device(device_num)
    print("current GPU device ", torch.cuda.current_device())
    strt = time.time()
    a = torch.randn(100000000).cuda()   
    xx = time.time() - strt
    print("execution time, to create 1E8 random numbers, is ", xx)
    # print(a)
    # print(a + 2)

no_of_GPUs= torch.cuda.device_count()
print("how many GPUs are there:", no_of_GPUs)
for i  in range(0, no_of_GPUs):
    print(i, "th GPU is", torch.cuda.get_device_name(i))
    do_something(i)

示例输出:

how many GPUs are there: 2
0 th GPU is TITAN Xp COLLECTORS EDITION
current GPU device  0
execution time, to create 1E8 random numbers, is  5.527713775634766

1 th GPU is Quadro P400
current GPU device  1
execution time, to create 1E8 random numbers, is  5.511776685714722

1 个答案:

答案 0 :(得分:3)

尽管您可能会相信,但您看到的性能差异不足是因为随机数生成是在主机CPU而不是GPU上运行的。如果我像这样修改你的do_something例程:

def do_something(gpu_device, ongpu=False, N=100000000):
    torch.cuda.set_device(gpu_device)
    print("current GPU device ", torch.cuda.current_device())
    strt = time.time()
    if ongpu:
        a = torch.cuda.FloatTensor(N).normal_()
    else:
        a = torch.randn(N).cuda()
    print("execution time, to create 1E8 random no, is ", time.time() - strt)
    return a

并以两种方式运行,我得到非常不同的执行时间:

In [4]: do_something(0)
current GPU device  0
execution time, to create 1E8 random no, is  7.736972808837891
Out[4]: 

-9.3955e-01
-1.9721e-01
-1.1502e+00
     ......     
-1.2428e+00
 3.1547e-01
-2.1870e+00
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]

In [5]: do_something(0,True)
current GPU device  0
execution time, to create 1E8 random no, is  0.001735687255859375
Out[5]: 

 4.1403e+06
 5.7016e+06
 1.2710e+07
     ......     
 8.9790e+06
 1.3779e+07
 8.0731e+06
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]

即。你的版本需要7秒,我的需要1.7毫秒。我认为很明显哪一个在GPU上运行....