Question

我一直在练习编写CUDA代码并学习大规模并行编程背后的结构和理想。无论如何，我遇到了一个我不太了解的问题。

以下是代码：

#include <cuda_runtime.h>
#include <stdio.h>
#include <math.h>

__global__ void cudaTest(struct led* input[])
{
    int ledNum = blockIdx.x * blockDim.x + threadIdx.x;
}

int main()
{
    struct led
    {
        unsigned char red, green, blue;
    };

    struct led* input[1200];
    struct led* dInput[1200];

    cudaMalloc((void**)&dInput, sizeof(struct led) * 1200);
    cudaMemcpy(dInput, input, sizeof(struct led) * 1200,     cudaMemcpyHostToDevice);
    cudaTest<<<4, 300>>>(dInput);
    cudaMemcpy(input, dInput, sizeof(struct led) * 1200,    cudaMemcpyDeviceToHost);
    cudaDeviceSynchronize();
    cudaFree(dInput);

    printf("Input: %d", *input);

}

我遇到的问题是编制程序：

testCuda.cu（22）：错误：“led **”类型的参数与“led **”类型的参数不兼容

cudaTest<<<4, 300>>>(dInput);

由于显而易见的原因，我不明白这一点......它说基本上同样的事情与自身不相容。

我不知道这是如何为数组分配内存的问题，以及我如何初始化它，或者它是什么。非常感谢任何帮助。

编辑：仅针对某些上下文，此代码没有应用程序，它是我用于在将代码实现到项目之前测试代码的测试程序。该程序的目标很简单，为GPU上的阵列分配空间，将其传输到GPU，调用内核，然后将其传回。

Answer 1

编译器在第一次遇到它作为内核函数中的参数类型时，不知道你的struct led是什么。因此，您需要在使用之前定义该结构类型，即使是作为函数参数。您在普通的C或C ++中无法正常使用此构造，因此此处的基础概念并非特定于CUDA。

此外，对于dInput，我们不会为我们打算用作设备指针的指针创建主机分配。所以只需声明裸指针，然后在cudaMalloc中使用它来附加设备分配。

试试这个而不是你拥有的东西：

 #include <cuda_runtime.h>
 #include <stdio.h>
 #include <math.h>

 struct led
 {
    unsigned char red, green, blue;
 };

 __global__ void cudaTest(led *input)
 {
    int ledNum = blockIdx.x * blockDim.x + threadIdx.x;
    input[ledNum].red = 5;
 }

 int main()
 {

    led* input = new led[1200];
    led* dInput;

    cudaMalloc((void**)&dInput, sizeof(struct led) * 1200);
    cudaMemcpy(dInput, input, sizeof(struct led) * 1200,     cudaMemcpyHostToDevice);
    cudaTest<<<4, 300>>>(dInput);
    cudaMemcpy(input, dInput, sizeof(struct led) * 1200,    cudaMemcpyDeviceToHost);
    cudaDeviceSynchronize();
    cudaFree(dInput);

    printf("Input: %d", input[0].red);

 }

参数与相同类型的参数

1 个答案: