Question

基于问题convert from float-point to custom numeric type，我找到了一种将浮点类型转换为整数数组的便携式安全方法，并且代码工作正常，但是对于从double转换为{时的某些值{1}} 具有可由unsigned long long 安全表示的精度，转换失败的原因不是编译时错误，而是无效值，这是unsigned long long的最小可表示值或零，visual c ++ 2008，intel xe 2013和gcc 4.7.2。

的转换失败

这里是代码:(注意signed long long函数中while循环内的第一个语句

main

上面的转换代码会给出编译器到另一个的不同结果，这意味着当阶乘函数的参数说20所有编译器返回有效结果时，当值大于20时某些编译器获得其他部分的结果而不是它变得更大，例如#ifndef CHAR_BIT #include <limits.h> #endif #include <float.h> #include <math.h> typedef signed int int32; typedef signed long long int64; typedef unsigned int uint32; typedef unsigned long long uint64; typedef float float32; typedef double float64; // get size of type in bits corresponding to CHAR_BIT. template<typename t> struct sizeof_ex { static const uint32 value = sizeof(t) * CHAR_BIT; }; // factorial function float64 fct(int32 i) { float64 r = 1; do r *= i; while(--i > 1); return r; } int main() { // maximum 2 to power that can be stored in uint32 const uint32 power_2 = uint32(~0); // number of binary digits in power_2 const uint32 digit_cnt = sizeof_ex<uint32>::value; // number of array elements that will store expanded value const uint32 comp_count = DBL_MAX_EXP / digit_cnt + uint32((DBL_MAX_EXP / digit_cnt) * digit_cnt < DBL_MAX_EXP); // array elements uint32 value[comp_count]; // get factorial for 23 float64 f = fct<float64>(23); // save sign for later correction bool sign = f < 0; // remove sign from float-point if exists if (sign) f *= -1; // get number of binary digits in f uint32 actual_digits = 0; frexp(f, (int32*)&actual_digits); // get start index in array for little-endian format uint32 start_index = (actual_digits / digit_cnt) + uint32((actual_digits / digit_cnt) * digit_cnt < actual_digits) - 1; // get all parts but the last while (start_index > 0) { // store current part // in this line the compiler fails value[start_index] = uint64(f / power_2); // exclude it from f f -= power_2 * float64(value[start_index]); // decrement index --start_index; } // get last part value[0] = uint32(f); }它变为零。

请告诉我为什么会出现这些错误？

谢谢。

Answer 1

我认为你的转换逻辑没有任何意义。

你有一个名为“power_2”的值，尽管它已被评论，但实际上并不是2的幂。

通过除以小于32位的值来提取非常大（> 64位）数的位。显然，结果将是> 32位，但是您将其存储为32位值，将其截断。然后你用原来的除数重新乘以你的浮点数。但是，由于数字被截断，你减去的数量远远少于原始数值，这几乎肯定不是你所期望的。

我认为那更多的错误 - 你并不总是想要前32位，对于一个不是32位长的倍数的数字，你想要实际长度mod 32。

这是对您的代码的一种有点懒惰的黑客行为，它会按照我认为您正在尝试做的事情。请注意，pow()可以进行优化。

while (start_index > 0)
{
    float64 fpow = pow(2., 32. * start_index);
    // store current part
    // in this line the compiler fails

    value[start_index] = f / fpow;
    // exclude it from f

    f -= fpow * float64(value[start_index]);
    // decrement index
    --start_index;
}

这几乎没有经过考验，但希望说明我的意思。

从double转换为unsigned long long失败

1 个答案: