Question

例如，假设18.xxx作为函数的输入作为浮点值读入。它将被截断至18.0。从那时起，将其编码为：0 10011 0010000，它满足所需的13位浮点数，并将作为带有十进制值2448的int返回。任何人都知道如何使用移位完成此操作？

Answer 1

如果您的浮点数用32位IEEE 754单精度二进制格式表示，且无符号指数，则可能会执行此操作：

#include <stdio.h>
#include <string.h>
#include <assert.h>

unsigned short float32to13(float f) {

    assert(sizeof(float) == sizeof(unsigned int));

    unsigned int g;
    memcpy(&g, &f, sizeof(float)); // allow us to examine a float bit by bit

    unsigned int sign32 = (g >> 0x1f) & 0x1; // one bit sign
    unsigned int exponent32 = ((g >> 0x17) & 0xff) - 0x7f; // unbias 8 bits of exponent
    unsigned int fraction32 = g & 0x7fffff; // 23 bits of significand 

    assert(((exponent32 + 0xf) & ~ 0x1f) == 0); // don't overflow smaller exponent

    unsigned short sign13 = sign32;
    unsigned short exponent13 = exponent32 + 0xf; // rebias exponent by smaller amount
    unsigned short fraction13 = fraction32 >> 0x10; // drop lower 16 bits of significand precision

    return sign13 << 0xc | exponent13 << 0x7 | fraction13; // assemble a float13
}

int main() {

    float f = 18.0;

    printf("%u\n", float32to13(f));

    return 0;
}

<强>输出

> ./a.out
2448
>

我将任何endian问题和其他错误检查留给最终用户。提供此示例仅用于向OP演示在浮点格式之间转换所需的移位类型。与实际浮点格式的任何相似之处纯属巧合。

位移以将浮动包装到c中的13位浮点？

1 个答案: