Question

我的一个朋友实现了一个实际可行的稀疏版本的torch.bmm，但是当我尝试测试时，我遇到了运行时错误（与此实现无关），我不明白。我已经看到了一些关于是否但无法找到解决方案的主题。这是代码和错误：

if __name__ == "__main__":
     tmp = torch.zeros(1).cuda()
     batch_csr = BatchCSR()
     sparse_bmm = SparseBMM()

     i=torch.LongTensor([[0,5,8], [1,5,8], [2,5,8]])
     v=torch.FloatTensor([4,3,8])
     s=torch.Size([3,500,500])

     indices, values, size = i,v,s

     a_ = torch.sparse.FloatTensor(indices, values, size).cuda().transpose(2, 1)
     batch_size, num_nodes, num_faces = a_.size()

     a = a_.to_dense()

     for _ in range(10):
        b = torch.randn(batch_size, num_faces, 16).cuda()
        torch.cuda.synchronize()
        time1 = time.time()
        result = torch.bmm(a, b)
        torch.cuda.synchronize()
        time2 = time.time()
        print("{} CuBlas dense bmm".format(time2 - time1))

        torch.cuda.synchronize()
        time1 = time.time()
        col_ind, col_ptr = batch_csr(a_.indices(), a_.size())
        my_result = sparse_bmm(a_.values(), col_ind, col_ptr, a_.size(), b)
        torch.cuda.synchronize()
        time2 = time.time()
        print("{} My sparse bmm".format(time2 - time1))

        print("{} Diff".format((result-my_result).abs().max()))

错误：

Traceback (most recent call last):
  File "sparse_bmm.py", line 72, in <module>
    b = torch.randn(3, 500, 16).cuda()
  File "/home/bizeul/virtual_env/lib/python2.7/site-packages/torch/_utils.py", line 65, in _cuda
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THC/generic/THCTensorCopy.c:18

使用命令CUDA_LAUNCH_BLOCKING = 1运行时，出现错误：

/b/wheel/pytorch-src/torch/lib/THC/THCTensorIndex.cu:121: void indexAddSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 1, SrcDim = 1, IdxDim = -2]: block: [0,0,0], thread: [0,0,0] Assertion `dstIndex < dstAddDimSize` failed.
THCudaCheck FAIL file=/b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu line=292 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "sparse_bmm.py", line 69, in <module>
    a = a_.to_dense()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /b/wheel/pytorch-src/torch/lib/THCS/generic/THCSTensorMath.cu:292

Answer 1

您传递以创建稀疏张量的索引不正确。

这是应该如何：

i = torch.LongTensor([[0, 1, 2], [5, 5, 5], [8, 8, 8]])

如何创建稀疏张量：

让我们举一个更简单的例子。让我们说我们想要以下张量：

  0   0   0   2   0
  0   0   0   0   0
  0   0   0   0  20
[torch.cuda.FloatTensor of size 3x5 (GPU 0)]

如您所见，数字（2）需要位于稀疏张量的（0,3）位置。数字（20）需要位于（2,4）位置。

为了创建它，我们的索引张量应该看起来像这样

[[0 , 2],
 [3 , 4]]

而且，现在为代码创建上面的稀疏张量：

i=torch.LongTensor([[0, 2], [3, 4]])
v=torch.FloatTensor([2, 20])
s=torch.Size([3, 5])
a_ = torch.sparse.FloatTensor(indices, values, size).cuda()

关于cuda的断言错误的更多评论：

Assertion 'dstIndex < dstAddDimSize' failed.告诉我们，很有可能，你的索引超出范围。因此，每当您注意到这一点时，请查找可能为任何张量提供错误索引的位置。

火炬代码产生CUDA运行时错误

1 个答案: