Question

我正在使用PyOpenCL处理Python中的图像并将3D numpy数组（height x width x 4）发送到内核。我在内核代码中索引3D数组时遇到问题。现在我只能将整个输入数组复制到输出。当前代码如下所示，其中img是带img.shape = (320, 512, 4)的图片：

__kernel void part1(__global float* img, __global float* results)
{
    unsigned int x = get_global_id(0);
    unsigned int y = get_global_id(1);
    unsigned int z = get_global_id(2);

    int index = x + 320*y + 320*512*z;

    results[index] = img[index];
}

但是，我不太明白这项工作如何。例如，如何在此内核中索引img[1, 2, 3]的Python等价物？此外，当我将结果返回到Python时，如果我希望将它放在numpy数组中的位置results上，那么应该将哪个索引用于results[1, 2, 3]来存储某个项目？

要运行它，我使用的是这个Python代码：

import pyopencl as cl
import numpy as np

class OpenCL:
def __init__(self):
    self.ctx = cl.create_some_context()
    self.queue = cl.CommandQueue(self.ctx)

def loadProgram(self, filename):
    f = open(filename, 'r')
    fstr = "".join(f.readlines())
    self.program = cl.Program(self.ctx, fstr).build()

def opencl_energy(self, img):
    mf = cl.mem_flags

    self.img = img.astype(np.float32)

    self.img_buf = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.img)
    self.dest_buf = cl.Buffer(self.ctx, mf.WRITE_ONLY, self.img.nbytes)

    self.program.part1(self.queue, self.img.shape, None, self.img_buf, self.dest_buf)
    c = np.empty_like(self.img)
    cl.enqueue_read_buffer(self.queue, self.dest_buf, c).wait()
    return c

example = OpenCL()
example.loadProgram("get_energy.cl")
image = np.random.rand(320, 512, 4)
image = image.astype(np.float32)
results = example.opencl_energy(image)
print("All items are equal:", (results==image).all())

Answer 1

<强>更新 OpenCL文档声明（在3.5中），

"Memory objects are categorized into two types: buffer objects, and image objects. A buffer
object stores a one-dimensional collection of elements whereas an image object is used to store a
two- or three- dimensional texture, frame-buffer or image."

所以，缓冲区总是线性的，或者线性化，如下面的示例所示。

import pyopencl as cl
import numpy as np


h_a = np.arange(27).reshape((3,3,3)).astype(np.float32)

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags
d_a  = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=h_a)

prg = cl.Program(ctx, """
__kernel void p(__global const float *d_a) {
  printf("Array element is %f ",d_a[10]);
}
""").build()

prg.p(queue, (1,), None, d_a)

给我

"Array element is 10"

作为输出。因此，缓冲区实际上是线性化阵列。然而，从numpy知道的天真的[x，y，z]方法并不是那样的。使用2或3-D图像而不是缓冲区应该可以工作。

Answer 2

虽然这不是最佳解决方案，但我在Python中对数组进行了线性化并将其作为1D发送。在内核代码中，我从线性索引计算了x，y和z。当我回到Pyhon时，我将它重新塑造成原来的形状。

Answer 3

我遇到了同样的问题。在https://lists.tiker.net/pipermail/pyopencl/2009-October/000134.html 这是一个简单的例子，说明如何使用适合我的PyOpenCL的3D数组。我在这里引用代码以供将来参考：

import pyopencl as cl
import numpy
import numpy.linalg as la

sizeX=4
sizeY=2
sizeZ=5
a = numpy.random.rand(sizeX,sizeY,sizeZ).astype(numpy.float32)

ctx = cl.Context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)

prg = cl.Program(ctx, """
    __kernel void sum(__global const float *a, __global float *b)
    {
      int x = get_global_id(0);
      int y = get_global_id(1);
      int z = get_global_id(2);

      int idx = z * %d * %d + y * %d + x;

      b[idx] = a[idx] * x + 3 * y + 5 * z;
    }
    """ % (sizeY, sizeX, sizeX) ).build()

prg.sum(queue, a.shape, a_buf, dest_buf)
cl.enqueue_read_buffer(queue, dest_buf, a).wait()
print a

PyOpenCL索引内核代码

3 个答案: