Question

如何在'print'函数中访问'do_sth'函数（查看代码）？为什么在没有使用cudaMemcpy的情况下，GPU可以看到'N'（看代码）变量/常量？

 __device__ void do_sth(char *a, int N)
 {
         int idx = blockIdx.x * blockDim.x + threadIdx.x;
         if(idx < N)
         {       
                 a[idx] = a[idx]; 
         }
 }


 __global__ void print(char *a, int N) 
 {     
         //question_1: why there is an access to N, it is now in GPU memory, how?
         int idx = blockIdx.x * blockDim.x + threadIdx.x;

         //do_sth<<<nblock2,blocksize2>>>(a,N); //error_1: a host function call can not be configured
         //do_sth(&&a,N); //error_2: expected an expression

         if(idx<N)
         {       
                 a[idx]=a[idx];
         }
 }

Answer 1

__global__函数（又名“内核”）已驻留在GPU上。它的所有参数（变量a和N）在调用时通过共享或常量内存（取决于您的设备类型）传递，因此您可以直接访问这些变量。参数大小有限 - 费米前卡上的256B和费米上的 ~~16KB（？）~~ 4KB，所以如果要传输大块数据，就无法避免cudaMemcpy个函数
__global__函数参数不应修改。
从__device__致电__global__时，不指定三重括号中的配置参数。 __device__函数将由到达内核调用的所有线程调用。请注意，您可以从if语句中调用函数，以防止某些线程执行它。
~~在当前版本的CUDA中，在内核执行期间不可能产生更多线程。~~
CUDA C ++中没有一元&&运算符（普通C ++中没有这样的运算符，现在新标准出现时不确定）

从全局函数调用设备函数

1 个答案: