I have a OpenCL kernel for some computation. I found only one thread gives different result with CPU codes. I am using vs2010 x64 release mode.
By checking the OpenCL codes by some examples, I found some interesting results. Here are the testing examples in kernel codes.
I tested 3 cases in OpenCl kernel, the precision is checked by printf("%.10f", fval);
float fval = (10296184.0) / (float)(x*y*z); // which gives result fval = 3351.6225585938
float fval = (10296184.0f) / (float)(x*y*z); // which gives result fval = 3351.6225585938
Variables are: int x,y, z
these values are computed by some operations. And their values are x=12, y=16, z=16;
float fval = (10296184.0) / (float)(12*16*16); // which gives result fval = 3351.6223144531
float fval = (10296184.0f) / (float)(12*16*16); // which gives result fval = 3351.6223144531
However, when I compute the difference of fval
by using above two expressions, the result is 0 if using 10296184.0
.
float fval = (10296184.0) / (float)(x*y*z) - (10296184.0) / (float)(12*16*16); // which gives result fval = 0.0000000000
float fval = (10296184.0f) / (float)(x*y*z) - (10296184.0f) / (float)(12*16*16); // which gives result fval = 0.0001812663
Could anyone explain the reason or give me some hints?
答案 0 :(得分:3)
一些观察结果:
两个#include "stdafx.h"
#include <string>
#include <fstream>
#include<set>
#include<map>
#include<iostream>
using namespace std;
typedef struct
{
long number;
std::string name;
double amount;
long volume;
} person_struct;
person_struct _struct;
set<person_struct> myset;
map < string, set<person_struct>> mymap;
int main(int argc, char* argv)
{
_struct.number = 100;
_struct.name = "TOM";
_struct.amount = 111;
_struct.volume = 230;
myset.insert(_struct);
mymap.insert(_struct.name, myset);
return 0;
}
值相差1 ULP。所以结果相差最小。
float
// Float ULP in the 2's place here
// v
0x1.a2f3ea0000000p+11 3351.622314... // OP's lower float value
0x1.a2f3eaaaaaaabp+11 3351.622395... // higher precision quotient
0x1.a2f3ec0000000p+11 3351.622558... // OP's higher float value
在编译时计算, close 结果与预期的数学答案一致。
(10296184.0) / (float)(12*16*16)
是在运行时计算的。
考虑使用float fval = (10296184.0) / (float)(x*y*z)
个变量,令人惊讶的是代码正在使用float
数学进行此除法。这是一个double
常数除以double
(这是double
产品的推广),产生float
商,转换为double
和然后保存。我希望float
- 注意10296184.0f
- 已被使用,然后数学可以全部以f
s完成。
C允许float
表示的不同舍入模式。这在编译时和运行时可能不同,可能解释差异。知道FLT_ROUNDS
的结果(函数获得当前的舍入方向。)会有所帮助。
OP 可能采用了各种编译器优化,牺牲了速度的精度。
C 未指定数学运算的精度,但在质量平台上fegetround()
应该预期最后一个ULP是好的。我怀疑代码的数学实现很弱。