我的一个朋友实现了一个实际可行的稀疏版本的torch.bmm,但是当我尝试测试时,我遇到了运行时错误(与此实现无关),我不明白。我已经看到了一些关于是否但无法找到解决方案的主题。这是代码和错误:
if __name__ == "__main__":
tmp = torch.zeros(1).cuda()
batch_csr = BatchCSR()
sparse_bmm = SparseBMM()
i=torch.LongTensor([[0,5,8], [1,5,8], [2,5,8]])
v=torch.FloatTensor([4,3,8])
s=torch.Size([3,500,500])
indices, values, size = i,v,s
a_ = torch.sparse.FloatTensor(indices, values, size).cuda().transpose(2, 1)
batch_size, num_nodes, num_faces = a_.size()
a = a_.to_dense()
for _ in range(10):
b = torch.randn(batch_size, num_faces, 16).cuda()
torch.cuda.synchronize()
time1 = time.time()
result = torch.bmm(a, b)
torch.cuda.synchronize()
time2 = time.time()
print("{} CuBlas dense bmm".format(time2 - time1))
torch.cuda.synchronize()
time1 = time.time()
col_ind, col_ptr = batch_csr(a_.indices(), a_.size())
my_result = sparse_bmm(a_.values(), col_ind, col_ptr, a_.size(), b)
torch.cuda.synchronize()
time2 = time.time()
print("{} My sparse bmm".format(time2 - time1))
print("{} Diff".format((result-my_result).abs().max()))
错误:
Traceback (most recent call last):
File "sparse_bmm.py", line 72, in <module>
b = torch.randn(3, 500, 16).cuda()
File "/home/bizeul/virtual_env/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorCopy.c:18
使用命令CUDA_LAUNCH_BLOCKING = 1运行时,出现错误:
/b/wheel/pytorch-src/torch/lib/THC/THCTensorIndex.cu:121: void indexAddSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `dstIndex < dstAddDimSize` failed.
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu line=292 error=59 : device-side assert triggered
Traceback (most recent call last):
File "sparse_bmm.py", line 69, in <module>
a = a_.to_dense()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu:292
答案 0 :(得分:1)
您传递以创建稀疏张量的索引不正确。
这是应该如何:
i = torch.LongTensor([[0, 1, 2], [5, 5, 5], [8, 8, 8]])
如何创建稀疏张量:
让我们举一个更简单的例子。让我们说我们想要以下张量:
0 0 0 2 0
0 0 0 0 0
0 0 0 0 20
[torch.cuda.FloatTensor of size 3x5 (GPU 0)]
如您所见,数字(2)需要位于稀疏张量的(0,3)位置。数字(20)需要位于(2,4)位置。
为了创建它,我们的索引张量应该看起来像这样
[[0 , 2],
[3 , 4]]
而且,现在为代码创建上面的稀疏张量:
i=torch.LongTensor([[0, 2], [3, 4]])
v=torch.FloatTensor([2, 20])
s=torch.Size([3, 5])
a_ = torch.sparse.FloatTensor(indices, values, size).cuda()
关于cuda的断言错误的更多评论:
Assertion 'dstIndex < dstAddDimSize' failed.
告诉我们,很有可能,你的索引超出范围。因此,每当您注意到这一点时,请查找可能为任何张量提供错误索引的位置。