Question

我正在尝试CUDA动态并行，但是在确保并行内核午餐的正常工作方面存在问题。我编写了简单的测试代码：

__global__ void child(int id) {
    if(id%10000 == 0)
        printf("hello\n");
}

__global__ void parent(int nop) {
    unsigned int indX = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int indY = blockIdx.y*blockDim.y + threadIdx.y;
    unsigned int ind = indX* ((int)sqrtf(nop) + 1) + indY;
    if (ind < nop)
    {
        if (ind % 10000 == 0) {
            child << <1, 1 >> > (ind);
            printf("world!");
        }
    }
}

其中 nop 的值大于1000000。我想将在父内核中创建的变量传递给子对象1，但每次在调用过程中遇到未指定的故障或BSOD时，我都会将其传递给孩子。慢慢地，我没有足够的方法来正确地执行此操作了。

在这种情况下，我找不到任何有用的示例。

CUDA动态并行中从父内核到子内核的正确通信

0 个答案: