Question

我正在尝试计算RGB图像的平均Luminance。为此，我找到每个像素的亮度，即

L(r,g,b) = X*r + Y*g + Z*b (some linear combination).

然后通过总计所有像素的亮度并除以宽度*高度来找到平均值。为了加快速度，我正在使用pyopencl.reduction.ReductionKernel

我传递给它的数组是一个单维Numpy数组，所以它就像给出的例子一样工作。

import Image
import numpy as np
im = Image.open('image_00000001.bmp')
data = np.asarray(im).reshape(-1) # so data is a single dimension list
# data.dtype is uint8, data.shape is (w*h*3, )

我想将示例中的以下代码合并到其中。即我会更改数据类型和我传递的数组类型。这是一个例子：

a = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
b = pyopencl.array.arange(queue, 400, dtype=numpy.float32)

krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
        reduce_expr="a+b", map_expr="x[i]*y[i]",
        arguments="__global float *x, __global float *y")

my_dot_prod = krnl(a, b).get()

除此之外，我的map_expr将处理每个像素并将每个像素转换为其亮度值。并且减少expr保持不变。

问题是，它适用于数组中的每个元素，我需要它来处理每个连续3个元素（RGB）的每个像素。

一个解决方案是有三个不同的数组，一个用于R，一个用于G，一个用于B，这可以工作，但还有另一种方法吗？

Answer 1

编辑：我更改了程序以说明char4用法而不是float4：

import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array


deviceID = 0
platformID = 0
workGroup=(1,1)

N = 10
testData = np.zeros(N, dtype=cl_array.vec.char4)

dev = cl.get_platforms()[platformID].get_devices()[deviceID]

ctx = cl.Context([dev])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)


prg = cl.Program(ctx, """

__kernel void   Pack_Cmplx( __global char4* Data_In, int  N)
{
  int gid = get_global_id(0);

  //Data_In[gid] = 1; // This would change all components to one
  Data_In[gid].x = 1;  // changing single component
  Data_In[gid].y = 2;
  Data_In[gid].z = 3;
  Data_In[gid].w = 4;
}
 """).build()

prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
cl.enqueue_copy(queue, testData, Data_In)
print testData

我希望它有所帮助。

PyOpenCL减少内核在图像的每个像素上作为数组而不是每个字节（RGB模式，24位）

1 个答案: