我正在解决一个关于浮动下溢的C Primer Plus练习。任务是模拟它。我是这样做的:
db.new_collection.ensureIndex({my_key:1}); //for performance, not a necessity
db.old_collection.find({}).noCursorTimeout().forEach(function(doc) {
db.new_collection.update(
{ my_key: doc.my_key },
{
$push: { stuff: doc.stuff, other_stuff: doc.other_stuff},
$inc: { thing: doc.thing},
},
{ upsert: true }
);
});
结果是
#include<stdio.h>
#include<float.h>
int main(void)
{
// print min value for a positive float retaining full precision
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision:",FLT_MIN);
// print min value for a positive float retaining full precision divided by two
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by two:",FLT_MIN/2.0);
// print min value for a positive float retaining full precision divided by four
printf("%s\n %.150f\n", "Minimum positive float value retaining full precision divided by four:",FLT_MIN/4.0);
return 0;
}
我预计min float值除以2和4的精度会降低,但看起来精度还可以,并且没有下溢情况。这怎么可能?我错过了什么?
非常感谢
答案 0 :(得分:1)
评估精度的错误方法简单地将FLT_MIN
(当然是2的幂)除以2。
取而代之的是一个刚好超过2的幂的数字,所以它的二进制 significand就像1.000...(maybe total of 24 binary digits)...0001
。确保打印的值最初为float
。 (FLT_MIN/2.0
是double
。)
请注意,当数字小于FLT_MIN
时,精度会丢失:最小规范化正浮点数。
还要考虑FLT_TRUE_MIN
:最小正浮点数。见binary32
#include <float.h>
#include <math.h>
#include <stdio.h>
int main(void) {
char *format = "%.10e %a\n";
printf(format, FLT_MIN, FLT_MIN);
printf(format, FLT_TRUE_MIN, FLT_TRUE_MIN);
float f = nextafterf(1.0f, 2.0f);
do {
f /= 2;
printf(format, f, f); // print in decimal and hex for detail
} while (f);
return 0;
}
输出
1.1754943508e-38 0x1p-126
1.4012984643e-45 0x1p-149
5.0000005960e-01 0x1.000002p-1
2.5000002980e-01 0x1.000002p-2
1.2500001490e-01 0x1.000002p-3
...
2.3509889819e-38 0x1.000002p-125
1.1754944910e-38 0x1.000002p-126
5.8774717541e-39 0x1p-127 // lost least significant bit of precision
2.9387358771e-39 0x1p-128
...
2.8025969286e-45 0x1p-148
1.4012984643e-45 0x1p-149
0.0000000000e+00 0x0p+0