Question

我有以下一段cuda代码，正在尝试将数据从设备复制回主机。

我无法弄清楚我到底在做什么错。

#include<stdio.h>
#include<stdlib.h>
#define SLAB_SIZE 4
struct SlabList{
        int val[SLAB_SIZE];
        int key[SLAB_SIZE];
        struct SlabList* next;
};
void printList(struct SlabList *node) {
        while (node != NULL) {
                for(int i=0;i<SLAB_SIZE;i++){
                        printf("Key: %d\tValue:%d\n",node->key[i],node->val[i]);
                }
                node = node->next;
        }
}

__global__ void insertKernel(struct SlabList* SL){
        SL->key[0]=1;
        SL->val[0]=2;
        SL->next=NULL;
}
int main(void){
int N=12;
struct SlabList* d_SL = NULL;
cudaMalloc(&d_SL, N * sizeof(struct SlabList));
insertKernel<<<1,1>>>(d_SL);
struct SlabList* head = NULL;
cudaMemcpy(head, d_SL, N * sizeof(struct SlabList), cudaMemcpyDeviceToHost);
printList(head);//here head is still NULL.
return 0;
}

Answer 1

内存不是隐式分配的。您已在GPU内存中分配了一个数组，但尚未在CPU RAM中分配了一个数组。如果要在CPU中使用该变量，则需要创建一个新变量并对其进行分配。

...
struct SlabList* d_SL = NULL;
cudaMalloc(&d_SL, N * sizeof(struct SlabList));
struct SlabList* h_SL = NULL;

h_SL=(SlabList*)malloc(N*sizeof(struct SlabList));

cudaMemcpy(h_SL , d_SL, N * sizeof(struct SlabList), cudaMemcpyDeviceToHost);
...

还要注意命名约定。 d_SL的意思是“变量SL，但是它在设备（GPU）上的版本”。通常，它在CPU中的对应对象称为h_SL或“主机SL”。它有助于跟踪变量。

无法将数据从设备传输到CUDA中的主机

1 个答案: