我正在通过移植一些现有的CUDA功能来学习OpenCL。下面是我的CUDA和OpenCL内核。当相同的输入参数传递给两个函数时,输出的顺序为^ -3到^ -4。当我反复调用这些函数时,差异的顺序显着增加(这对我的预期输出非常不利)。我的OpenCL移植有什么问题吗?
注意:我在编译OpenCL内核时已经尝试过“-cl-opt-disable”
CUDA内核:
__global__ void normalize_kernel(int N, float *x, float *mean, float *variance, int batch, int filters, int spatial)
{
int index = (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x + threadIdx.x;
if (index >= N) return;
int f = (index/spatial)%filters;
x[index] = (x[index] - mean[f])/(sqrt(variance[f] + .00001f));
}
OpenCL内核:
__kernel void normalize_kernel(int N, __global float *x, __global float *mean, __global float *variance, int filters, int spatial)
{
int index = get_group_id(1) * get_global_size(0) + get_global_id(0);
if (index >= N) return;
int f = (index/spatial)%filters;
x[index] = (x[index] - mean[f])/(sqrt(variance[f] + .00001f));
}
输出:CUDA:OpenCL
{'1.293604': '1.293387',
'0.727771': '0.727677',
'0.868133': '0.867531',
'2.195427': '2.195059'...
答案 0 :(得分:0)
查看-cl-fp32-correctly-rounded-divide-sqrt
编译选项(OpenCL 1.2或更高版本)。在某些硬件上,这会稍微降低你的性能,但这可能不是什么大不了的事。
请参阅 https://www.khronos.org/registry/OpenCL/sdk/1.2/docs/man/xhtml/clCompileProgram.html