如何使用intel_sub_group_block_read <n>在OpenCL中以列主要顺序在子组中的工作项中分发读取的数据?

时间:2017-03-27 12:05:55

标签: multithreading opencl opencl-c

带矢量化的OpenCL代码就像这样

short8 x0, x1, x2, x3, x4, x5, x6, x7, m[8];

x0 = convert_short8(vload8(0, Org + 0 * Stride));
x1 = convert_short8(vload8(0, Org + 1 * Stride));
x2 = convert_short8(vload8(0, Org + 2 * Stride));
x3 = convert_short8(vload8(0, Org + 3 * Stride));
x4 = convert_short8(vload8(0, Org + 4 * Stride));
x5 = convert_short8(vload8(0, Org + 5 * Stride));
x6 = convert_short8(vload8(0, Org + 6 * Stride));
x7 = convert_short8(vload8(0, Org + 7 * Stride));

m[0] = x0 + x4;
m[1] = x1 + x5;
m[2] = x2 + x6;
m[3] = x3 + x7;
m[4] = x0 - x4;
m[5] = x1 - x5;
m[6] = x2 - x6;
m[7] = x3 - x7;

现在我尝试使用带有块读取的英特尔OpenCL子组扩展来重写上述逻辑。

int8 iO;
uint8 block1,block2;
int2 coordA;
coordA = int2(0,0);

block1 = intel_sub_group_block_read8(Org, coordA);
coordA.x += 4;
block2 = intel_sub_group_block_read8(Org, coordA);

for (int i = 0 ; i < 8; i++)
{
    iO.lo = convert_int4(as_uchar4(((uint*)(&block1))[i]));
    iO.hi = convert_int4(as_uchar4(((uint*)(&block2))[i]));
    // Do computations here
}

我在这里阅读2个8行的块,类型为uint。在对uchar进行类型转换时,我得到2个8x4数据块,这实际上是uchar类型数据的8x8块。但是上述方法的问题在于它将创建具有行主要顺序的数据的工作项。因此,如果我尝试执行m[0] = x0 + x4之类的计算,则无法将x0x4放在不同的工作项中。因此,我能想到的另一种方法是将数据按列主要顺序存储在工作项中。因此,我将使用垂直线程而不是水平线程。但是我无法弄清楚如何去做。

0 个答案:

没有答案