如何在opencl中使用用户定义的结构

时间:2016-02-03 02:01:18

标签: c++ opencl

我正在尝试探索在opencl中使用结构的方法

我首先尝试struct(在主机上定义)

typedef struct UserStruct {

    cl_int x;
    cl_int y;
    cl_int z;
    cl_int w;
} UserStruct;

和结构(在设备上定义)

typedef struct UserStruct {
    int x;
    int y;
    int z;
    int w;
} UserStruct;

使用定义的结构,我创建两个缓冲区(para_input和para_output)并使用不同的值初始化它们。内核函数将值从para_input复制到para_output。

示例工作正常。

但是,当我在结构中添加cl_int16时,复制内核不起作用。 这是修改后的结构:

typedef struct UserStruct {

    cl_int x;
    cl_int y;
    cl_int z;
    cl_int w; 

    cl_int16 vn16;
} UserStruct;

和结构(在设备上定义)

typedef struct UserStruct {
    int x;
    int y;
    int z;
    int w;

    int16 vn16;
} UserStruct;

是否需要在主机和设备上对齐结构? 或者在opencl中使用结构最流行的方法是什么?感谢。

2 个答案:

答案 0 :(得分:1)

Expanding on the comment:

It seems that your problem is caused by the difference in the default structure alignment in your C compiler and OpenCL compiler. Namely, the C compiler packs the structure to the minimum of 80 bytes, while the OpenCL compiler aligns it to 128 bytes (which is a good thing to do performance-wise). You can match the alignment by specifying it explicitly: either pack both structures, or align both to 128 bytes. See OpenCL docs and your compiler's docs (which, most probably, uses the same notation) for details.

In any case, I would recommend going with the 128 bytes alignment, unless you are pressured for space. Declare your structures as:

typedef struct UserStruct {

    cl_int x;
    cl_int y;
    cl_int z;
    cl_int w;

    cl_int16 vn16;
} __attribute__ ((aligned (128))) UserStruct;

and analogously for the host one.

As a side note, nothing prevents you from using the same structure both for the host and the device code. cl_ints are just aliases for native types anyway (although the explicit alignment specifier will be still necessary, because the structure will potentially be processed by different compilers).

答案 1 :(得分:0)

在Windows机器或C ++编译器上,请尝试使用以下行来对齐结构,__attribute__适用于GNC编译器。

typedef __declspec(align(128)) struct UserStruct {

    cl_int x;
    cl_int y;
    cl_int z;
    cl_int w;

    cl_int16 vn16;
} UserStruct;