Question

我的CUDA编程有问题！输入是矩阵A（2 x 2）输出是一个矩阵A（2 x 2），每个新值都是旧值的** 3指数** 例如：输入：A：{2,2}输出：A {8,8} {2,2} {8,8}

我在文件CudaCode.CU中有2个函数：

   __global__ void Power_of_02(int &a)
{
    a=a*a;
}

 //***************
__global__ void Power_of_03(int &a)
{
    int tempt = a;
    Power_of_02(a); //a=a^2;
    a= a*tempt; // a = a^3
}

和内核：

__global__ void CudaProcessingKernel(int *dataA )    //kernel function  

   {  
        int bx = blockIdx.x;  
    int tx = threadIdx.x;  
        int tid = bx * XTHREADS + tx;  

    if(tid < 16)
    {
    Power_of_03(dataA[tid]);
        }
    __syncthreads();

   }

我认为是正确的，但出现错误：从__global__函数调用__global__函数（“Power_of_02”）（“Power_of_03”）仅允许在compute_35架构上或以上

为什么我错了？怎么修呢？

Answer 1

错误相当明确。用__global__装饰的CUDA函数代表内核。内核可以从主机代码启动。在cc 3.5或更高版本的GPU上，您还可以从设备代码启动内核。因此，如果您从设备代码（即来自另一个用__global__或__global__修饰的CUDA函数）调用__device__函数，那么您必须编译适当的体系结构。这称为CUDA动态并行，如果你想使用它，你应该read the documentation学习如何使用它。

当您启动内核时，无论是来自主机还是设备代码，您都必须提供启动配置，即三重V形符号之间的信息：

CudaProcessingKernel<<<grid, threads>>>(d_A);

如果您想使用来自其他内核的2-power-of-2代码，则需要以类似，适当的方式调用它。

但是，基于代码的结构，似乎可以通过将{2和2次幂函数'声明为__device__函数来使事情有效：

   __device__ void Power_of_02(int &a)
{
    a=a*a;
}

 //***************
__device__ void Power_of_03(int &a)
{
    int tempt = a;
    Power_of_02(a); //a=a^2;
    a= a*tempt; // a = a^3
}

这应该对你有用，也许是你的意图。用__device__修饰的函数不是内核（因此它们不能直接从主机代码调用），而是可以直接从任何体系结构上的设备代码调用。 programming guide也有助于解释差异。

函数调用CUDA C ++中的另一个函数

1 个答案: