OpenCL压缩二进制数据

时间:2015-08-06 08:12:37

标签: java opencl

如果您有使用OpenCL的经验,我想请求您的帮助。 任务是如此微不足道,很遗憾我看不出有什么问题,但我现在无法解决这个问题。 我们有二维3D体积数据存储在2D切片中。在Java的CPU端,每个切片被压缩成一个位数组,也就是说,每个切片的大小计算如下:

sliceSize = (width*height+31)/32;

用于将切片从1字节/体素压缩到1 int / 32体素的Java代码是:

    hostUncompressed = new byte[depth * height * width];
    hostCompressed = new int[depth * sliceSize];

    deviceUncompressed = new byte[depth * height * width];
    deviceCompressed = new int[depth * sliceSize];

    int numOnes = 0;
    int k = 0;
    for (int i = 0; i < depth; ++i) {
        for (int y = 0; y < height; ++y) {
            for (int x = 0; x < width; ++x) {
                hostUncompressed[k++] = (byte) (((int) (Math.random() * 1000)) % 2);
                numOnes += (hostUncompressed[k - 1] == 1) ? 1 : 0;
            }
        }
    }
for (int i = 0; i < depth; ++i) {
        int start = i * sliceSize;
        int index = start;
        int targetIndex = 0;
        int mask = 1;
        int buffer = 0;
        for (int y = 0; y < height; ++y) {
            for (int x = 0; x < width; ++x) {
                if (hostUncompressed[index] > 0) {
                    buffer |= mask;
                }
                ++index;
                if ((index & 31) == 0) {
                    hostCompressed[start + targetIndex++] = buffer;
                    buffer = 0;
                    mask = 1;
                } else {
                    mask <<= 1;
                }
            }
        }
    }

我的OpenCL端口如下所示:

public void compress(cl_mem vol, int[] size3, int[] voxels) {
    int totalCompressedSize = voxels.length;

    cl_mem devCompressed = CL.clCreateBuffer(cl.getContext(),
            CL.CL_MEM_WRITE_ONLY, Sizeof.cl_int * totalCompressedSize,
            null, null);

    int[] sliceSizeInts = new int[]{(size3[0] * size3[1] + 31) / 32};
    int[] dimensions = new int[]{size3[0], size3[1], size3[2], 0};
    long[] localWorkSize = new long[]{1, 1, 1};
    long[] globalWorkSize = new long[]{sliceSizeInts[0], size3[2], 1};

    cl.calcLocalWorkSize(globalWorkSize, localWorkSize);
    CLUtils.round_size(localWorkSize, globalWorkSize);

    int k = 0;
    CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_mem,
            Pointer.to(devCompressed));
    CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_mem, Pointer.to(vol));
    CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_int4,
            Pointer.to(dimensions));

    CL.clEnqueueNDRangeKernel(cl.getCommandQueue(), kernels[1], 2, null,
            globalWorkSize, localWorkSize, 0, null, null);

    CL.clEnqueueReadBuffer(cl.getCommandQueue(), devCompressed, CL.CL_TRUE,
            0, Sizeof.cl_int * totalCompressedSize, Pointer.to(voxels), 0,
            null, null);
    CL.clReleaseMemObject(devCompressed);
    CL.clFinish(cl.getCommandQueue());
}

kernel void roiVolume_dataCompress(
global int*     compressed,
global char*    raw,
int4            dimensions) {
int comprSubId = get_global_id(0);
int sliceIndex = get_global_id(1);

int rawSliceSize   = dimensions.y * dimensions.x;
int comprSliceSize = (rawSliceSize + 31)/32;

if ( sliceIndex < 0 || sliceIndex >= dimensions.z ||
     comprSubId < 0 || comprSubId >= comprSliceSize )
    return;

int rawIndex;
int rawSubIndex;
int value = 0;

for (int i = 0; i < 32; ++i)
{   
    rawSubIndex = comprSubId*32+i;
    if ( rawSubIndex < rawSliceSize) 
    {
        rawIndex = sliceIndex * rawSliceSize + rawSubIndex;
        if (raw[rawIndex] != 0)
            value |= (1 << i);
    }
}

int comprIndex = sliceIndex * comprSliceSize + comprSubId;
compressed[comprIndex] = value;
}

如果深度= 1,它可以工作,所以如果只在一个切片上执行,但是从第二个切片执行则会出错,我无法在数组中看到任何可能有帮助的模式。

真的很感激任何帮助。 谢谢。

0 个答案:

没有答案