我试用了我的机器上的两个GPU,我希望Titan-XP比Quadro-P400更快。但是,两者的执行时间几乎相同。
我需要知道PyTorch是否会动态地选择一个GPU而不是另一个,或者,我自己将必须在运行时指定PyTorch将使用哪一个。
以下是测试中使用的代码段:
import torch
import time
def do_something(gpu_device):
torch.cuda.set_device(gpu_device) # torch.cuda.set_device(device_num)
print("current GPU device ", torch.cuda.current_device())
strt = time.time()
a = torch.randn(100000000).cuda()
xx = time.time() - strt
print("execution time, to create 1E8 random numbers, is ", xx)
# print(a)
# print(a + 2)
no_of_GPUs= torch.cuda.device_count()
print("how many GPUs are there:", no_of_GPUs)
for i in range(0, no_of_GPUs):
print(i, "th GPU is", torch.cuda.get_device_name(i))
do_something(i)
示例输出:
how many GPUs are there: 2
0 th GPU is TITAN Xp COLLECTORS EDITION
current GPU device 0
execution time, to create 1E8 random numbers, is 5.527713775634766
1 th GPU is Quadro P400
current GPU device 1
execution time, to create 1E8 random numbers, is 5.511776685714722
答案 0 :(得分:3)
尽管您可能会相信,但您看到的性能差异不足是因为随机数生成是在主机CPU而不是GPU上运行的。如果我像这样修改你的do_something
例程:
def do_something(gpu_device, ongpu=False, N=100000000):
torch.cuda.set_device(gpu_device)
print("current GPU device ", torch.cuda.current_device())
strt = time.time()
if ongpu:
a = torch.cuda.FloatTensor(N).normal_()
else:
a = torch.randn(N).cuda()
print("execution time, to create 1E8 random no, is ", time.time() - strt)
return a
并以两种方式运行,我得到非常不同的执行时间:
In [4]: do_something(0)
current GPU device 0
execution time, to create 1E8 random no, is 7.736972808837891
Out[4]:
-9.3955e-01
-1.9721e-01
-1.1502e+00
......
-1.2428e+00
3.1547e-01
-2.1870e+00
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]
In [5]: do_something(0,True)
current GPU device 0
execution time, to create 1E8 random no, is 0.001735687255859375
Out[5]:
4.1403e+06
5.7016e+06
1.2710e+07
......
8.9790e+06
1.3779e+07
8.0731e+06
[torch.cuda.FloatTensor of size 100000000 (GPU 0)]
即。你的版本需要7秒,我的需要1.7毫秒。我认为很明显哪一个在GPU上运行....