case 1:

Question

I have a OpenCL kernel for some computation. I found only one thread gives different result with CPU codes. I am using vs2010 x64 release mode.

By checking the OpenCL codes by some examples, I found some interesting results. Here are the testing examples in kernel codes.

I tested 3 cases in OpenCl kernel, the precision is checked by printf("%.10f", fval);

case 1:

float fval = (10296184.0) / (float)(x*y*z);  // which gives result fval = 3351.6225585938

float fval = (10296184.0f) / (float)(x*y*z);  // which gives result fval = 3351.6225585938

Variables are: int x,y, z

these values are computed by some operations. And their values are x=12, y=16, z=16;

case 2:

float fval = (10296184.0) / (float)(12*16*16); // which gives result fval = 3351.6223144531

float fval = (10296184.0f) / (float)(12*16*16); // which gives result fval = 3351.6223144531

case 3:

However, when I compute the difference of fval by using above two expressions, the result is 0 if using 10296184.0.

float fval = (10296184.0) / (float)(x*y*z) - (10296184.0) / (float)(12*16*16); // which gives result fval = 0.0000000000

float fval = (10296184.0f) / (float)(x*y*z) - (10296184.0f) / (float)(12*16*16); // which gives result fval = 0.0001812663

Could anyone explain the reason or give me some hints?

Answer 1

一些观察结果：

两个#include "stdafx.h" #include <string> #include <fstream> #include<set> #include<map> #include<iostream> using namespace std; typedef struct { long number; std::string name; double amount; long volume; } person_struct; person_struct _struct; set<person_struct> myset; map < string, set<person_struct>> mymap; int main(int argc, char* argv) { _struct.number = 100; _struct.name = "TOM"; _struct.amount = 111; _struct.volume = 230; myset.insert(_struct); mymap.insert(_struct.name, myset); return 0; }值相差1 ULP。所以结果相差最小。

float

// Float ULP in the 2's place here // v 0x1.a2f3ea0000000p+11 3351.622314... // OP's lower float value 0x1.a2f3eaaaaaaabp+11 3351.622395... // higher precision quotient 0x1.a2f3ec0000000p+11 3351.622558... // OP's higher float value在编译时计算， close 结果与预期的数学答案一致。

(10296184.0) / (float)(12*16*16)是在运行时计算的。

考虑使用float fval = (10296184.0) / (float)(x*y*z)个变量，令人惊讶的是代码正在使用float数学进行此除法。这是一个double常数除以double（这是double产品的推广），产生float商，转换为double和然后保存。我希望float - 注意10296184.0f - 已被使用，然后数学可以全部以f s完成。

C允许float表示的不同舍入模式。这在编译时和运行时可能不同，可能解释差异。知道FLT_ROUNDS的结果（函数获得当前的舍入方向。）会有所帮助。

OP 可能采用了各种编译器优化，牺牲了速度的精度。

C 未指定数学运算的精度，但在质量平台上fegetround()应该预期最后一个ULP是好的。我怀疑代码的数学实现很弱。

OpenCL kernel float division gives different result

case 1:

case 2:

case 3:

1 个答案: