我试图使用Python和PyOpenCL为我在网上找到的代码中的图像实现高斯滤镜。我的原始图像是numpy数组,但我很困惑,因为我应该将图像传递给GPU。
最初,内核接收OpenCL映像作为输入。这工作正常,内核运行正常,但是,我还没有找到一种方法将GPU计算的输出(也是一个OpenCL图像)转换为numpy数组。这是必需的,因为我必须在运行GPU过滤器后执行其他计算。
我尝试使用pyOpenCL数组,但在这种情况下有2个问题:
read_imagef
,我在我的内核中使用了该函数。cl_array
没有模块get()'错误。我想知道:
image2d_t
说输入是一个图像一样? read_imagef
的等价物?提前多多感谢。下面的内核代码:
内核:
__kernel void gaussian(__read_only image2d_t inputImage,
__read_only image2d_t filterImage,
__write_only image2d_t outputImage,
const int nInWidth,
const int nFilterWidth){
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
const int xOut = get_global_id(0);
const int yOut = get_global_id(1);
float4 sum = (float4)(0.0, 0.0, 0.0, 1.0);
for(int r = 0; r < nFilterWidth; r++){
for(int c = 0; c < nFilterWidth; c++){
int2 location = (xOut + r, yOut + c);
float4 filterVal = read_imagef(filterImage, sampler, location);
float4 inputVal = read_imagef(inputImage, sampler, location);
sum.x += filterVal.x * inputVal.x;
sum.y += filterVal.y * inputVal.y;
sum.z += filterVal.z * inputVal.z;
sum.w = 1.0;
}
}
int2 outLocation = (xOut, yOut);
write_imagef(outputImage, outLocation, sum);
}
答案 0 :(得分:2)
这是一个复杂的问题,因为我有同样的问题,我想尝试详细回答它们。让我们将您的问题分解为较小的部分,看看发生了什么。
<强>数据类型强>
您似乎混淆了彼此之间的某些数据类型。 OpenCL本身使用 images 或 arrays , pyopenCL数组映射到OpenCL中的数组, pyopenCL映像 >到OpenCL图像。混合这两个将在一些特殊情况下起作用,但总的来说,这不是一个好主意。
数据访问
OpenCL中的图像需要采样器从中读取。可以通过简单的坐标访问来访问数组,就像在python中一样。 (有关我在那里遇到的问题的更多信息,请参阅here或here)。
<强>运动强>
使用pyopencl在OpenCL中移动的所有内容都有自己的复制功能。因此,要将图像或数组从设备移动到主机,请务必将相应的复制功能排入上下文中的队列。
答案 1 :(得分:0)
The underlying OpenCl data structure of pyopencl.Array
is the so called buffer. You can retrieve the buffer object via the base_data
attribute of the Array (see the docs). The buffer can be passed in a kernel call, however the kernel has to be adjusted to handle buffers not images (change the kernel argument type to __global float* inputImage
etc., access elements as in regular multidimensional array indexing).
Anyway, the PyOpenCL Array class is designed to write code using numpy style that will be executed on the device. This does not require you to write any kernel code yourself anymore. Instead, you could do something like this:
import pyopencl as cl
input_array = cl.array.to_device(queue, input_numpy_array)
filter_array = cl.array.to_device(queue, filter_numpy_array)
output_array = cl.array.zeros_like(input_array)
# half height and half width of filter
fhh, fhw = filter_array.shape[0] // 2, filter_array.shape[1] // 2
for y in range(input_array.shape[0]):
for x in range(input_array.shape[1]):
patch = input_array[y-fhh:y+fhh+1, x-fhw:x+fhw+1]
sum = cl.array.sum(patch * filter_array)
output_array[y, x] = sum
output_numpy_array = output_array.get()
Note that I assumed using a single-channel (gray) image. Also I did not test the code above but I assume the implementation to be horribly ineffective. Edge handling is not covered.
Finally, you should consider not using PyOpenCl Arrays, given your kernel. Create pyopencl.Image
objects from your numpy arrays and pass them in the kernel call. This way, you don't have to modify your kernel.