如果您有使用OpenCL的经验,我想请求您的帮助。 任务是如此微不足道,很遗憾我看不出有什么问题,但我现在无法解决这个问题。 我们有二维3D体积数据存储在2D切片中。在Java的CPU端,每个切片被压缩成一个位数组,也就是说,每个切片的大小计算如下:
sliceSize = (width*height+31)/32;
用于将切片从1字节/体素压缩到1 int / 32体素的Java代码是:
hostUncompressed = new byte[depth * height * width];
hostCompressed = new int[depth * sliceSize];
deviceUncompressed = new byte[depth * height * width];
deviceCompressed = new int[depth * sliceSize];
int numOnes = 0;
int k = 0;
for (int i = 0; i < depth; ++i) {
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
hostUncompressed[k++] = (byte) (((int) (Math.random() * 1000)) % 2);
numOnes += (hostUncompressed[k - 1] == 1) ? 1 : 0;
}
}
}
for (int i = 0; i < depth; ++i) {
int start = i * sliceSize;
int index = start;
int targetIndex = 0;
int mask = 1;
int buffer = 0;
for (int y = 0; y < height; ++y) {
for (int x = 0; x < width; ++x) {
if (hostUncompressed[index] > 0) {
buffer |= mask;
}
++index;
if ((index & 31) == 0) {
hostCompressed[start + targetIndex++] = buffer;
buffer = 0;
mask = 1;
} else {
mask <<= 1;
}
}
}
}
我的OpenCL端口如下所示:
public void compress(cl_mem vol, int[] size3, int[] voxels) {
int totalCompressedSize = voxels.length;
cl_mem devCompressed = CL.clCreateBuffer(cl.getContext(),
CL.CL_MEM_WRITE_ONLY, Sizeof.cl_int * totalCompressedSize,
null, null);
int[] sliceSizeInts = new int[]{(size3[0] * size3[1] + 31) / 32};
int[] dimensions = new int[]{size3[0], size3[1], size3[2], 0};
long[] localWorkSize = new long[]{1, 1, 1};
long[] globalWorkSize = new long[]{sliceSizeInts[0], size3[2], 1};
cl.calcLocalWorkSize(globalWorkSize, localWorkSize);
CLUtils.round_size(localWorkSize, globalWorkSize);
int k = 0;
CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_mem,
Pointer.to(devCompressed));
CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_mem, Pointer.to(vol));
CL.clSetKernelArg(kernels[1], k++, Sizeof.cl_int4,
Pointer.to(dimensions));
CL.clEnqueueNDRangeKernel(cl.getCommandQueue(), kernels[1], 2, null,
globalWorkSize, localWorkSize, 0, null, null);
CL.clEnqueueReadBuffer(cl.getCommandQueue(), devCompressed, CL.CL_TRUE,
0, Sizeof.cl_int * totalCompressedSize, Pointer.to(voxels), 0,
null, null);
CL.clReleaseMemObject(devCompressed);
CL.clFinish(cl.getCommandQueue());
}
kernel void roiVolume_dataCompress(
global int* compressed,
global char* raw,
int4 dimensions) {
int comprSubId = get_global_id(0);
int sliceIndex = get_global_id(1);
int rawSliceSize = dimensions.y * dimensions.x;
int comprSliceSize = (rawSliceSize + 31)/32;
if ( sliceIndex < 0 || sliceIndex >= dimensions.z ||
comprSubId < 0 || comprSubId >= comprSliceSize )
return;
int rawIndex;
int rawSubIndex;
int value = 0;
for (int i = 0; i < 32; ++i)
{
rawSubIndex = comprSubId*32+i;
if ( rawSubIndex < rawSliceSize)
{
rawIndex = sliceIndex * rawSliceSize + rawSubIndex;
if (raw[rawIndex] != 0)
value |= (1 << i);
}
}
int comprIndex = sliceIndex * comprSliceSize + comprSubId;
compressed[comprIndex] = value;
}
如果深度= 1,它可以工作,所以如果只在一个切片上执行,但是从第二个切片执行则会出错,我无法在数组中看到任何可能有帮助的模式。
真的很感激任何帮助。 谢谢。