我正在尝试用pyopencl对2个数组求和,但输出中我得到奇怪的数字。
代码:
def sum_arrays_with_cl(array1, array2):
"""
Sums 2 arrays with GPU.
"""
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_array = numpy.array(array1)
b_array = numpy.array(array2)
a_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_array)
b_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_array)
dest_buf = cl.Buffer(ctx, mf.WRITE_ONLY, b_array.nbytes)
prg = cl.Program(ctx, """
__kernel void sum(__global const float *a,
__global const float *b, __global float *res_g)
{
int gid = get_global_id(0);
res_g[gid] = a[gid] + b[gid];
}
""").build()
prg.sum(queue, a_array.shape, None, a_buf, b_buf, dest_buf)
a_plus_b = numpy.empty_like(a_array)
cl.enqueue_copy(queue, a_plus_b, dest_buf).wait()
return list(a_plus_b)
a = [1为虚拟范围(10)] b = [i for i in range(10)]
打印sum_arrays_with_cl(a,b)
输出:
[0, 0, 0, 0, 0, 5, 6, 7, 8, 9]
我做错了什么?
答案 0 :(得分:2)
您需要明确说明数组的类型,否则在主机上创建的数组将与设备所期望的不匹配。由于您的内核需要32位浮点数据,因此您可以像这样创建数组:
a_array = numpy.array(array1).astype(numpy.float32)
b_array = numpy.array(array2).astype(numpy.float32)