Question

当我必须在写入过程中手动进行操作时，让OpenCL图像对象在读取过程中检查边界的目的是什么？

比方说，我有一个内核，可以将图像简单转换为相同尺寸的图像。由于我需要使global_work_size为local_work_size的倍数（我使用的是OpenCL），因此会有一些工作项只是为了填充而没有做任何有价值的工作。条件分支通常会降低执行速度，因此应该通过执行自动边界检查来帮助image2d_t对象。

但是，似乎我在写回另一张图像时仍然必须进行手动边界检查，以免冒未定义行为的风险，因此，为什么不只在开始时进行检查并排除越界工作项的读取

__kernel void check_at_write (__read_only image2d_t input,
                   __write_only image2d_t output
                   int width, int height)
{
    /* declarations of sampler, indexes */

    /* imagine we're only doing 
     * a simple map with foo
     * for the sake of the example 
     */
    float4 res = foo(read_imagef(input, sampler, (int2)(index_x, index_y)));
    if (index_x < width && index_y < height)
        write_imagef(output, (int2)(index_x, index_y), res);
}

__kernel void check_at_read (__read_only image2d_t input,
                   __write_only image2d_t output
                   int width, int height)
{
    /* declarations of sampler, indexes */
    if (index_x < width && index_y < height) {
        float4 res = foo(read_imagef(input, sampler, (int2)(index_x, index_y)));
        write_imagef(output, (int2)(index_x, index_y), res);
    }
}

那么，为什么还要在读取过程中使用这种自动检查机制呢？我有办法避免完全检查吗？

Answer 1

如果图像尺寸与全局工作尺寸匹配，则无需检查读取或写入。如果没有，则在整个内核周围添加一次检查，然后不要检查读或写（假设它们位于同一位置）。在这两种情况下，都使用CLK_ADDRESS_NONE，因为在某些硬件上它可以更快（与其他模式相比）。当您的读取位置与写入位置不同时（因为您可能正在变换坐标或读取像素位置周围的光晕以进行过滤），位置夹紧（或镜像或重复）非常方便。无论如何，它都在硬件中，并且OpenCL允许您访问它。我发现它在我们的许多图像处理内核中非常有用。

OpenCL图像对象和边界检查

1 个答案: