Question

我需要编写一个从unsigned long long到float的函数，并且舍入应该朝着最近的偶数。我不能只进行C ++类型转换，因为AFAIK标准没有指定舍入。我正在考虑使用boost :: numeric，但在阅读文档后我找不到任何有用的线索。可以使用该库完成吗？当然，如果有替代方案，我很乐意使用它。

非常感谢任何帮助。

编辑：添加一个示例以使事情更清晰。假设我想将0xffffff7fffffffff转换为其浮点表示。 C ++标准允许以下任何一个：

0x5f7fffff~1.9999999 * 2 ^ 63
0x5f800000 = 2 ^ 64

现在，如果将round的限制添加到最接近的even，则只能接受第一个结果。

Answer 1

由于源代码中有很多位无法在float中表示，而您（显然）不能依赖语言的转换，因此您必须自己完成。

我设计了一个可能会或可能不会帮助您的计划。基本上，在float中有31位代表正数，因此我选取源数中的31个最高有效位。然后我保存并掩盖所有低位。然后根据低位的值，向上或向下舍入“新”LSB，最后使用static_cast创建float。

我留下了一些你可以根据需要删除的couts。

const unsigned long long mask_bit_count = 31;

float ull_to_float2(unsigned long long val)
{
    // How many bits are needed?
    int b = sizeof(unsigned long long) * CHAR_BIT - 1;
    for(; b >= 0; --b)
    {
        if(val & (1ull << b))
        {
            break;
        }
    }

    std::cout << "Need " << (b + 1) << " bits." << std::endl;

    // If there are few enough significant bits, use normal cast and done.
    if(b < mask_bit_count)
    {
        return static_cast<float>(val & ~1ull);
    }

    // Save off the low-order useless bits:
    unsigned long long low_bits = val & ((1ull << (b - mask_bit_count)) - 1);
    std::cout << "Saved low bits=" << low_bits << std::endl;

    std::cout << val << "->mask->";
    // Now mask away those useless low bits:
    val &= ~((1ull << (b - mask_bit_count)) - 1);
    std::cout << val << std::endl;

    // Finally, decide how to round the new LSB:
    if(low_bits > ((1ull << (b - mask_bit_count)) / 2ull))
    {
        std::cout << "Rounding up " << val;
        // Round up.
        val |= (1ull << (b - mask_bit_count));
        std::cout << " to " << val << std::endl;
    }
    else
    {
        // Round down.
        val &= ~(1ull << (b - mask_bit_count));
    }

    return static_cast<float>(val);
}

Answer 2

我这样做在Smalltalk中任意精度整数（LargeInteger），在佳乐/菲罗/ VisualWorks中/羚的Smalltalk /海豚Smalltalk的实现和测试，甚至在博客，如果你可以读取的Smalltalk代码http://smallissimo.blogspot.fr/2011/09/clarifying-and-optimizing.html <登记/> 加速算法的技巧是这样的：符合IEEE 754标准的FPU将精确地舍入不精确操作的结果。因此，我们可以承担1个不精确的操作，并让硬件正确地为我们舍入。这让我们可以轻松处理前48位。但我们承担不起两个不精确的操作，所以我们有时不得不另外处理最低位... 希望代码有足够的文档记录：

#include <math.h>
#include <float.h>
float ull_to_float3(unsigned long long val)
{
    int prec=FLT_MANT_DIG ;             // 24 bits, the float precision
    unsigned long long high=val>>prec;  // the high bits above float precision
    unsigned long long mask=(1ull<<prec) - 1 ;      // 0xFFFFFFull a mask for extracting significant bits
    unsigned long long tmsk=(1ull<<(prec - 1)) - 1; // 0x7FFFFFull same but tie bit
    // handle trivial cases, 48 bits or less,
    // let FPU apply correct rounding after exactly 1 inexact operation
    if( high <= mask )
        return ldexpf((float) high,prec) + (float) (val & mask);
    // more than 48 bits,
    // what scaling s is needed to isolate highest 48 bits of val?
    int s = 0;
    for( ; high > mask ; high >>= 1) ++s;
    // high now contains highest 24 bits
    float f_high = ldexpf( (float) high , prec + s );
    // store next 24 bits in mid
    unsigned long long mid = (val >> s) & mask;
    // care of rare case when trailing low bits can change the rounding:
    // can mid bits be a case of perfect tie or perfect zero?
    if( (mid & tmsk) == 0ull )
    {
        // if low bits are zero, mid is either an exact tie or an exact zero
        // else just increment mid to distinguish from such case
        unsigned long long low = val & ((1ull << s) - 1);
        if(low > 0ull) mid++;
    }
    return f_high + ldexpf( (float) mid , s );
}

额外奖励：此代码应根据您的FPU舍入模式进行舍入，因为我们隐含地使用FPU以+操作执行舍入。
但是，要注意标准中的积极优化＆lt; C99，谁知道编译器什么时候会使用扩展精度...（除非你强制使用-ffloat-store之类的东西）如果你总是想要舍入到最接近的偶数，无论当前的舍入模式如何，那么你必须在以下情况下增加高位：

mid bits＆gt; tie，where tie = 1ull＆lt;＆lt;（prec-1）;
mid bits == tie和（低位> 0或高位是奇数）。

编辑：
如果你坚持使用舍入到最近均匀的平局，那么另一个解决方案是使用非相邻部分的Shewchuck EXPANSION-SUM（fhigh，flow）和（fmid）参见http://www-2.cs.cmu.edu/afs/cs/project/quake/public/papers/robust-arithmetic.ps：

#include <math.h>
#include <float.h>
float ull_to_float4(unsigned long long val)
{
    int prec=FLT_MANT_DIG ;             // 24 bits, the float precision
    unsigned long long mask=(1ull<<prec) - 1 ; // 0xFFFFFFull a mask for extracting significant bits
    unsigned long long high=val>>(2*prec);     // the high bits
    unsigned long long mid=(val>>prec) & mask; // the mid bits
    unsigned long long low=val & mask;         // the low bits
    float fhigh = ldexpf((float) high,2*prec);
    float fmid  = ldexpf((float) mid,prec);
    float flow  = (float) low;
    float sum1 = fmid + flow;
    float residue1 = flow - (sum1 - fmid);
    float sum2 = fhigh + sum1;
    float residue2 = sum1 - (sum2 - fhigh);
    return (residue1 + residue2) + sum2;
}

这使得无分支算法具有更多操作。它可能适用于其他舍入模式，但我让你分析论文以确保......

Answer 3

8字节整数和浮点格式之间有什么可以直接解释，但实现起来却不那么简单！

下一段涉及8字节有符号整数中可表示的内容。

1（2 ^ 0）和16777215（2 ^ 24-1）之间的所有正整数在iEEE754单精度（浮点）中都是完全可表示的。或者，确切地说，所有数字在2 ^ 0和2 ^ 24-2 ^ 0之间，以2 ^ 0为增量。下一个可精确表示的正整数范围是2 ^ 1到2 ^ 25-2 ^ 1，增量为2 ^ 1，依此类推，最多为2 ^ 39到2 ^ 63-2 ^ 39，增量为2 ^ 39。 / p>

无符号8字节整数值最多可以表示为2 ^ 64-2 ^ 40，增量为2 ^ 40。

单一的精确格式不会在此停止，而是以2 ^ 103的增量一直到2 ^ 103到2 ^ 127-2 ^ 103的范围。

对于4字节整数（长整数），最高浮点范围为2 ^ 7到2 ^ 31-2 ^ 7，增量为2 ^ 7。

在x86架构上，浮点指令集支持的最大整数类型是8字节有符号整数。 2 ^ 64-1无法通过传统方式加载。

这意味着对于给定的范围增量表示为“2 ^ i，其中i是整数> 0”，以位模式0x1到2 ^ i-1结尾的所有整数将不能在该范围内精确表示在一个浮点数这意味着你所谓的向上舍入实际上取决于你正在工作的范围。如果你的范围的粒度，尝试向上舍入1（2 ^ 0）或16（2 ^ 4）是没有用的在2 ^ 19。

如果您尝试进行以下转换，您建议执行的操作的另一个结果（舍入2 ^ 63-1到2 ^ 63）可能会导致（长整数格式）溢出：longlong_int =（long long）（（float ）2 ^ 63）。

查看我写的这个小程序（在C中），它应该有助于说明什么是可能的，什么不是。

int main (void)
{
  __int64 basel=1,baseh=16777215,src,dst,j;
  float cnvl,cnvh,range;
  int i=0;

  while (i<40)
  {
    src=basel<<i;
    cnvl=(float) src;
    dst=(__int64) cnvl;    /* compare dst with basel */

    src=baseh<<i;
    cnvh=(float) src;
    dst=(__int64) cnvh;    /* compare dst with baseh */

    j=basel;
    while (j<=baseh)
    {
      range=(float) j;
      dst=(__int64) range;

      if (j!=dst) dst/=0;

      j+=basel;
    }

    ++i;
  }
  return i;
}

该程序显示可表示的整数值范围。它们之间存在重叠：例如，2 ^ 5在所有范围内都可表示，其中下边界2 ^ b，其中1 =

从无符号长long转换为float，使用round到even even

3 个答案: