Question

我想我处于一个非常奇怪的边界情况，可能有双精度问题，我想知道，发生了什么。

在OpenCL内核中我使用：

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

__private int k = 2; // I need k to be an int, because I want to use as a counter
__private double s = 18;
__private double a = 1;

a = a/(double)k; // just to show, that I make in-place typecasting of k
a = k+1;
k = (int)a; //to show that I store k in a double buffer in an intermediate-step
if ((k-1)==2)
{
//    k = 3;
    s = pow(s/(double)(k-1),0.5);
}

这导致我s = 2.999[...]6 但是，如果我取消注释k=3行，我会得到（在我看来）正确的结果s = 3。那是为什么？

作为附带信息：当我这样做时，不会发生同样的行为

s = sqrt(s/(double)(k-1))

下面是pyopencl的完整，最小内核和主机代码

内核（Minima.cl）：

#pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void init_z(__global double * buffer)
{
    __private int x = get_global_id(0);
    __private int y = get_global_id(1);
    //w,h
    __private int w_y = get_global_size(1);
    __private int address = x*w_y+y;
    //h,w
    __private double init = 3.0;
    buffer[address]=init;
}

__kernel void root(__global double * buffer)
{
    __private int x = get_global_id(0);
    __private int y = get_global_id(1);
    //w,h
    __private int w_y = get_global_size(1);
    __private int address = x*w_y+y;
    //h,w
    __private double value = 18;
    __private int k;
    __private double out;
    k = (int) buffer[address];
  //k = 3;  If this line is uncommented, the result will be exact.
    out = pow(value/(double)(k-1), 0.5);
    buffer[address] = out;
}

主机：

import pyopencl as cl
import numpy as np


platform = cl.get_platforms()[0]
devs = platform.get_devices()
device1 = devs[1]
h_buffer = np.empty((10,10)).astype(np.float64)
mf = cl.mem_flags
ctx = cl.Context([device1])
Queue1 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)
Queue2 = cl.CommandQueue(ctx,properties=cl.command_queue_properties.PROFILING_ENABLE)
mf = cl.mem_flags
m_dic = {0:mf.READ_ONLY,1:mf.WRITE_ONLY,2:mf.READ_WRITE}


fi = open('Minimal.cl', 'r')
fstr = "".join(fi.readlines())
prg = cl.Program(ctx, fstr).build()
knl = prg.init_z
knl.set_scalar_arg_dtypes([None,])
knl_root = prg.root
knl_root.set_scalar_arg_dtypes([None,])



def f():
    d_buffer =  cl.Buffer(ctx,m_dic[2], int(10 * 10  * 8))
    knl.set_args(d_buffer)
    knl_root.set_args(d_buffer)
    a = cl.enqueue_nd_range_kernel(Queue2,knl,(10,10),None)
    b = cl.enqueue_nd_range_kernel(Queue2,knl_root,(10,10),None, wait_for = [a,])
    cl.enqueue_copy(Queue1,h_buffer,d_buffer,wait_for=[b,])
    return h_buffer
a = f()
a[0,0] # Getting the result on the host.

修改：由于更多的不清楚，我再次更新这个问题。我理解，对于相同的输入，pow和sqrt的值不必相同。我的问题是，为什么pow显示SAME输入的不同输出，这取决于我从哪里得到它。

二进制文件在pastebin上： k_explicit和k_read

printf("a%\n", out)带有0x1.8p+1行k=3，0x1.7ffffffffffffp+1带注释。

Answer 1

因此，经过大量讨论后，我的代码似乎没有问题。这种行为似乎与硬件有关，到目前为止只能在Nvidia硬件上重现。我仍然对发生的“为什么”和“如何”感兴趣，但在这种情况下问题得到解答。

OpenCL / C pow（x，0.5）！= sqrt（x）

1 个答案: