CUDA与OpenCL:浮点精度变化

时间:2017-07-21 01:46:03

标签: opencl

我正在通过移植一些现有的CUDA功能来学习OpenCL。下面是我的CUDA和OpenCL内核。当相同的输入参数传递给两个函数时,输出的顺序为^ -3到^ -4。当我反复调用这些函数时,差异的顺序显着增加(这对我的预期输出非常不利)。我的OpenCL移植有什么问题吗?

注意:我在编译OpenCL内核时已经尝试过“-cl-opt-disable”

CUDA内核:

__global__ void normalize_kernel(int N, float *x, float *mean, float *variance, int batch, int filters, int spatial)
{
    int index = (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x + threadIdx.x;
    if (index >= N) return;
    int f = (index/spatial)%filters;

    x[index] = (x[index] - mean[f])/(sqrt(variance[f] + .00001f));
}

OpenCL内核:

__kernel void normalize_kernel(int N, __global float *x, __global float *mean, __global float *variance, int filters, int spatial)
{
    int index =  get_group_id(1) * get_global_size(0) + get_global_id(0);
    if (index >= N) return;
    int f = (index/spatial)%filters;

    x[index] = (x[index] - mean[f])/(sqrt(variance[f] + .00001f));
}

输出:CUDA:OpenCL

{'1.293604': '1.293387',
 '0.727771': '0.727677',
 '0.868133': '0.867531',
 '2.195427': '2.195059'...

1 个答案:

答案 0 :(得分:0)

查看-cl-fp32-correctly-rounded-divide-sqrt编译选项(OpenCL 1.2或更高版本)。在某些硬件上,这会稍微降低你的性能,但这可能不是什么大不了的事。

请参阅 https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCompileProgram.html