我想在torch.inverse()上并行运行多个GPU,但不能。
我看到了这篇帖子Matmul on multiple GPUs,内容涉及matmul的整个过程。它表明,如果为每个GPU分配了多个张量,则matmul将并行运行。我能够为matmul复制此行为,但是当我尝试对torch.inverse()执行相同的操作时,当我检查“ watch nvidia-smi”时,它似乎按顺序运行。同样,当我用torch fft函数替换torch.inverse()函数时,也会得到并行GPU使用。有什么想法吗?
import torch
ngpu = torch.cuda.device_count()
# This is the allocation to each GPU.
lis = []
for i in range(ngpu):
lis.append(torch.rand(5000,5000,device = 'cuda:'+ str(i)))
# per the matmul on multiple GPUs post this should already be in parallel
# but doesnt seem to be based on watch nvidia-smi
C_ = []
for i in range(ngpu):
C_.append(torch.inverse(lis[i]))
编辑:可以将其与上方链接中的FFT代码(以下)和Matmul代码进行比较。
import torch
ngpu = torch.cuda.device_count()
# This is the allocation to each GPU.
lis = []
for i in range(ngpu):
lis.append(torch.rand(5000,5000,2,device = 'cuda:'+ str(i)))
# per the matmul on multiple GPUs post this should already be in parallel
# but doesnt seem to be based on watch nvidia-smi
C_ = []
for i in range(ngpu):
C_.append(torch.fft(lis[i],2))