Question

是否可以将数据从CPU传递到GPU而无需将其作为参数明确传递？

我不想将它作为参数传递主要用于语法糖原因 - 我需要传递大约20个常量参数，还因为我连续调用两个内核（几乎）相同的参数。

我想要一些

的内容

__constant__ int* blah;

__global__ myKernel(...){
    ... i want to use blah inside ...
}

int main(){
    ...
    cudaMalloc(...allocate blah...)
    cudaMemcpy(copy my array from CPU to blah)

}

Answer 1

cudaMemcpyToSymbol似乎是您正在寻找的功能。它与cudaMemcpy的工作方式类似，但附加了一个“偏移”参数，看起来它可以更容易地复制到2D数组中。

（我对提供代码犹豫不决，因为我无法对其进行测试 - 但请参阅this thread和this post以供参考。）

Answer 2

使用__device__来应用全局变量。它与使用__constant__

的方式类似

Answer 3

你可以采取一些方法。这取决于您将如何使用该数据。

如果您的模式访问是常量，并且块内的线程读取相同的位置，请使用__constant__ memory来广播读取请求。
如果您的模式访问与给定位置的邻居相关，或者与随机访问（未合并）相关，那么我建议使用纹理内存
如果您需要读/写数据并知道数组的大小，请在内核中将其定义为__device__ blah [size]。

例如：

__constant__ int c_blah[65536]; // constant memory
__device__ int g_blah[1048576]; // global memory

__global__ myKernel() {
    // ... i want to use blah inside ...
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    // get data from constant memory
    int c = c_blah[idx];
    // get data from global memory
    int g = g_blah[idx];
    // get data from texture memory
    int t = tex1Dfetch(ref, idx);
    // operate
    g_blah[idx] = c + g + t;
}


int main() {
    // declare array in host
    int c_h_blah[65536]; // and initialize it as you want
    // copy from host to constant memory
    cudaMemcpyToSymbol(c_blah, c_h_blah, 65536*sizeof(int), 0, cudaMemcpyHostToDevice);
    // declare other array in host
    int g_h_blah[1048576]; // and initialize it as you want
    // declare one more array in host
    int t_h_blah[1048576]; // and initialize it as you want
    // declare a texture reference
    texture<int, 1, cudaReadModeElementType> tref;
    // bind the texture to the array
    cudaBindTexture(0,tref,t_h_blah, 1048576*sizeof(int));
    // call your kernel
    mykernel<<<dimGrid, dimBlock>>>();
    // copy result from GPU to CPU memory
    cudaMemcpy(g_h_blah, g_blah, 1048576*sizeof(int), cudaMemcpyDeviceToHost);
}

您可以在内核中使用三个数组，而无需将任何参数传递给内核。请注意，这只是一个使用示例，而不是内存层次结构的优化使用，即：不建议以这种方式使用常量内存。

希望这有帮助。

将数据从CPU传递到GPU，而不将其作为参数明确传递

3 个答案: