我正在尝试探索在opencl中使用结构的方法
我首先尝试struct(在主机上定义)
typedef struct UserStruct {
cl_int x;
cl_int y;
cl_int z;
cl_int w;
} UserStruct;
和结构(在设备上定义)
typedef struct UserStruct {
int x;
int y;
int z;
int w;
} UserStruct;
使用定义的结构,我创建两个缓冲区(para_input和para_output)并使用不同的值初始化它们。内核函数将值从para_input复制到para_output。
示例工作正常。
但是,当我在结构中添加cl_int16时,复制内核不起作用。 这是修改后的结构:
typedef struct UserStruct {
cl_int x;
cl_int y;
cl_int z;
cl_int w;
cl_int16 vn16;
} UserStruct;
和结构(在设备上定义)
typedef struct UserStruct {
int x;
int y;
int z;
int w;
int16 vn16;
} UserStruct;
是否需要在主机和设备上对齐结构? 或者在opencl中使用结构最流行的方法是什么?感谢。
答案 0 :(得分:1)
Expanding on the comment:
It seems that your problem is caused by the difference in the default structure alignment in your C compiler and OpenCL compiler. Namely, the C compiler packs the structure to the minimum of 80 bytes, while the OpenCL compiler aligns it to 128 bytes (which is a good thing to do performance-wise). You can match the alignment by specifying it explicitly: either pack both structures, or align both to 128 bytes. See OpenCL docs and your compiler's docs (which, most probably, uses the same notation) for details.
In any case, I would recommend going with the 128 bytes alignment, unless you are pressured for space. Declare your structures as:
typedef struct UserStruct {
cl_int x;
cl_int y;
cl_int z;
cl_int w;
cl_int16 vn16;
} __attribute__ ((aligned (128))) UserStruct;
and analogously for the host one.
As a side note, nothing prevents you from using the same structure both for the host and the device code. cl_int
s are just aliases for native types anyway (although the explicit alignment specifier will be still necessary, because the structure will potentially be processed by different compilers).
答案 1 :(得分:0)
在Windows机器或C ++编译器上,请尝试使用以下行来对齐结构,__attribute__
适用于GNC编译器。
typedef __declspec(align(128)) struct UserStruct {
cl_int x;
cl_int y;
cl_int z;
cl_int w;
cl_int16 vn16;
} UserStruct;