我正在尝试使用OpenCL实现以前用CUDA编写的代码,以便在Altera FPGA上运行。我在回读应该在缓冲区中的数据时遇到问题。我使用与CUDA版本相同的结构,只有不同的是cudaMalloc可以为所有类型的指针分配内存,而对于clCreateBuffer,我必须使用cl_mem。我的代码如下所示:
cl_mem d_buffer=clCreateBuffer(...);
//CUDA version:
//float* d_buffer;
//cudaMalloc((void **)&d_buffer, MemSz);
clEnqueueWriteBuffer(queue, d_buffer, ..., h_data, );
//cudaMemcpy(d_buffer, h_Data, MemSz, cudaMemcpyHostToDevice);
#define d_buffer(index1, index2, index3) &d_buffer + index1/index2*index3
//#define d_buffer(index1, index2, index3) d_buffer + index1/index2*index3
cl_mem* d_data=d_buffer(1,2,3);
clEnqueueReadBuffer(queue, *d_data,...)// Error reading d_data
我为clCreateBuffer尝试了clEnqueueMapBuffer或CL_MEM_ALLOC_HOST_PTR,它也不起作用。
答案 0 :(得分:1)
cl_mem
是一个不透明的对象。你不应该对它进行指针运算;试图这样做会导致非常讨厌的错误。
我不熟悉CUDA如何处理缓冲区分配,但是您注释掉的代码的含义是CUDA缓冲区始终是Host-Visible。这在OpenCL中非常严格。 OpenCL允许您将缓冲区“映射”到主机可见内存,但主机不会隐式显示它。如果您打算读取缓冲区的任意索引,则需要先将其映射或将其复制到主机数据。
float * h_data = new float[1000];
cl_mem d_buffer=clCreateBuffer(...);
clEnqueueWriteBuffer(queue, d_buffer, true, 0, 1000 * sizeof(float), h_data, 0, nullptr, nullptr);
//======OR======
//float * d_data = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_WRITE, 0, 1000 * sizeof(float), 0, nullptr, nullptr, nullptr));
//std::copy(h_data, h_data + 1000, d_data);
//clEnqueueUnmapMemObject(queue, d_buffer, d_data, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);
//Do work with buffer, probably in OpenCL Kernel...
float result;
size_t index = 1 / 2 * 3; //This is what you wrote in the original post
clEnqueueReadBuffer(queue, d_buffer, true, index * sizeof(float), 1 * sizeof(float), &result, 0, nullptr, nullptr);
//======OR======
//float * result_ptr = static_cast<float*>(clEnqueueMapBuffer(queue, d_buffer, true, CL_MAP_READ, index * sizeof(float), 1 * sizeof(float), 0, nullptr, nullptr, nullptr));
//result = *result_ptr;
//clEnqueueUnmapMemObject(queue, d_buffer, result_ptr, 0, nullptr, nullptr);
//clEnqueueBarrier(queue);
std::cout << "Result was " << result << std::endl;