Question

我在CUDA中阅读了很多关于处理2D数组的内容，我认为有必要在将其发送到GPU之前将其展平。但是我可以在GPU上分配1D阵列并将其作为GPU中的2D阵列进行访问吗？我试过但是失败了代码如下所示：

__global__ void kernel( int **d_a )
{ 

   cuPrintf("%p",local_array[0][0]);
}

int main(){

    int **A;

    int i;

    cudaPrintfInit();

    cudaMalloc((void**)&A,16*sizeof(int));

    kernel<<<1,1>>>(A);

    cudaPrintfDisplay(stdout,true);

    cudaPrintfEnd();
}

Answer 1

事实上，在GPU上使用它之前没有必要“展平”你的2D阵列（虽然这可以加速内存访问）。如果您想要2D数组，可以使用cudaMallocPitch之类的内容，这在CUDA C编程指南中有记录。我相信你的代码不起作用的原因是因为你malloc只编了一维数组 - A [0] [0]不存在。如果您查看代码，则会生成int s的一维数组，而不是int* s。如果你想要一个扁平的2D数组，你可以做类似的事情：

int** A;
cudaMalloc(&A, 16*length*sizeof(int*)); //where length is the number of rows/cols you want

然后在你的内核中使用（打印指向任何元素的指针）：

__global__ void kernel( int **d_a, int row, int col, int stride )
{ 
  printf("%p", d_a[ col + row*stride ]);
}

Answer 2

这就是我修复问题的方法我以通常的方式使用cudaMalloc但是在发送指向内核的指针时我将它转换为int（*）[col]，这对我有用

CUDA中的2D数组

2 个答案: