Question

我正在读一本C书，谈论浮点范围，作者给出了表：

Type     Smallest Positive Value  Largest value      Precision
====     =======================  =============      =========
float    1.17549 x 10^-38         3.40282 x 10^38    6 digits
double   2.22507 x 10^-308        1.79769 x 10^308   15 digits

我不知道最小正值和最大值列中的数字来自哪里。

Answer 1

32位浮点数具有23 + 1位尾数和8位指数（尽管使用-126到127），因此您可以表示的最大数字是：

(1 + 1 / 2 + ... 1 / (2 ^ 23)) * (2 ^ 127) = 
(2 ^ 23 + 2 ^ 23 + .... 1) * (2 ^ (127 - 23)) = 
(2 ^ 24 - 1) * (2 ^ 104) ~= 3.4e38

Answer 2

这些数字来自IEEE-754标准，它定义了浮点数的标准表示。链接explains上的维基百科文章如何知道用于符号，尾数和指数的位数，来到这些范围。

Answer 3

浮点数据类型的值来自总共32位，表示像这样分配的数字：

1位：符号位

8位：指数p

23位：尾数

指数存储为p + BIAS，其中BIAS为127，尾数有23位，第24个隐藏位假定为1.该隐藏位是尾数的最高位（MSB）和必须选择指数使其为1。

这意味着您可以代表的最小数字是01000000000000000000000000000000，即1x2^-126 = 1.17549435E-38。

最大值为011111111111111111111111111111111，尾数为2 *（1 - 1/65536），指数为127，给出(1 - 1 / 65536) * 2 ^ 128 = 3.40277175E38。

相同的原则适用于双精度，除了位是：

1位：符号位

11位：指数位

52位：尾数位

BIAS：1023

从技术上讲，限制来自用于表示浮点数的IEEE-754标准，以上是这些限制的来源

Answer 4

正如dasblinkenlight已经回答的那样，这些数字来自于IEEE-754中浮点数的表示方式，而Andreas的数学分解很好。

但是 - 请注意浮点数的精度并不是表格所示的6或15个有效十进制数字，因为IEEE-754数字的精度取决于有效二进制数字的数量。

float有24位有效二进制数字 - 这取决于所代表的数字，可转换为6-8位精度的十进制数字。
double有53个有效二进制数字，大约是15位小数。

如果您有兴趣，

Another answer of mine会有进一步的解释。

Answer 5

无穷大，NaN和次常态

这些是重要的警告，到目前为止，没有其他答案可以提及。

首先阅读IEEE 754和次正规数字的简介：What is a subnormal floating point number?

然后，对于单精度浮点数（32位）：

IEEE 754说，如果指数都是全（0xFF == 255），则它表示NaN或Infinity。

这就是为什么最大的非无限数具有指数0xFE == 254而不是0xFF的原因。

然后带有偏差，它将变为：
```
254 - 127 == 127
```
FLT_MIN是最小的 normal 数字。但是，还有一些较小的非正规的！那些占据-127指数插槽。

以下程序的所有断言都在Ubuntu 18.04 amd64上传递：

#include <assert.h>
#include <float.h>
#include <inttypes.h>
#include <math.h>
#include <stdlib.h>
#include <stdio.h>

float float_from_bytes(
    uint32_t sign,
    uint32_t exponent,
    uint32_t fraction
) {
    uint32_t bytes;
    bytes = 0;
    bytes |= sign;
    bytes <<= 8;
    bytes |= exponent;
    bytes <<= 23;
    bytes |= fraction;
    return *(float*)&bytes;
}

int main(void) {
    /* All 1 exponent and non-0 fraction means NaN.
     * There are of course many possible representations,
     * and some have special semantics such as signalling vs not.
     */
    assert(isnan(float_from_bytes(0, 0xFF, 1)));
    assert(isnan(NAN));
    printf("nan                  = %e\n", NAN);

    /* All 1 exponent and 0 fraction means infinity. */
    assert(INFINITY == float_from_bytes(0, 0xFF, 0));
    assert(isinf(INFINITY));
    printf("infinity             = %e\n", INFINITY);

    /* ANSI C defines FLT_MAX as the largest non-infinite number. */
    assert(FLT_MAX == 0x1.FFFFFEp127f);
    /* Not 0xFF because that is infinite. */
    assert(FLT_MAX == float_from_bytes(0, 0xFE, 0x7FFFFF));
    assert(!isinf(FLT_MAX));
    assert(FLT_MAX < INFINITY);
    printf("largest non infinite = %e\n", FLT_MAX);

    /* ANSI C defines FLT_MIN as the smallest non-subnormal number. */
    assert(FLT_MIN == 0x1.0p-126f);
    assert(FLT_MIN == float_from_bytes(0, 1, 0));
    assert(isnormal(FLT_MIN));
    printf("smallest normal      = %e\n", FLT_MIN);

    /* The smallest non-zero subnormal number. */
    float smallest_subnormal = float_from_bytes(0, 0, 1);
    assert(smallest_subnormal == 0x0.000002p-126f);
    assert(0.0f < smallest_subnormal);
    assert(!isnormal(smallest_subnormal));
    printf("smallest subnormal   = %e\n", smallest_subnormal);

    return EXIT_SUCCESS;
}

GitHub upstream。

编译并运行：

gcc -ggdb3 -O0 -std=c11 -Wall -Wextra -Wpedantic -Werror -o subnormal.out subnormal.c
./subnormal.out

输出：

nan                  = nan
infinity             = inf
largest non infinite = 3.402823e+38
smallest normal      = 1.175494e-38
smallest subnormal   = 1.401298e-45

Answer 6

这是该类型的指数部分的大小的结果，例如在IEEE 754中。您可以使用float.h中的FLT_MAX，FLT_MIN，DBL_MAX，DBL_MIN检查大小。

C中浮点数据类型的范围？

6 个答案: