Question

所以我知道如何在CUDA中做到这一点，但基本上，我想将一个小数字（0-5，变量）的_global ptrs传递给一个函数，然后将这些指针加载到本地或私有（因为虽然数量很少，而且我已经在内核中有了一个本地内存栅栏，但我不确定哪个是最快的，我将在实现它之后通过实验来确定这个问题。所以我写了这样的内核：

__kernel foo(
  __global int* img,
  __global int** img_history,
  __private int** private_history,
  uint history_length)//could be local
{
    for (int i = 0; i < history_length; i++)
       private_history[i] = img_history[i];
}

澄清一下，在cuda我这样做

__global__ foo(int* img, int** img_history, uint history_length)
{
   int* private_history[10];//max values 10
   for (int i = 0; i < history_length; i++)
      private_history[i] = img_history[i];
 }

并加载

int** host_array = new int*[history_length];
for (int i = 0; i < history_length; i++)
    cudaMalloc(host_array+i,size);
int** device_array;
cudaMalloc(&device_array,sizeof(int*)*history_length);
cudaMemcpy(device_array, host_array,sizeof(int*)*history_length,cudaMemcpyHostToDevice)

但是，我收到错误error: invalid address space for pointee of pointer argument to __kernel function。这样做的正确方法是什么？

Answer 1

我不知道你在CUDA中的表现如何。但这完全是禁止作为OpenCL内核的论据。

您无法将指针值复制到设备，然后直接使用它，因为内存地址不同。

为了做到这一点，你需要：

仅复制引用图像表的img_history（而非指针）的索引。
根据需要使用thouse索引进行操作（整数运算）。
使用这些索引访问图像表或执行任何操作。如果你需要使用这些索引访问img，那么它必须是内核的参数。你必须复制所有这些。（全长img数组）

示例：

__kernel foo(
  __global int* img,
  __global int* img_history,
  __private int* private_history,
  uint history_length)//could be local
{
    for (int i = 0; i < history_length; i++)
       private_history[i] = img_history[i];

    /* img[private_history[i]] */ //Use it as you wish
}

将一个锯齿状数组传递给opencl中的内核

1 个答案: