给定uint64_t
值,是否可以将其除以std::numeric_limits<uint64_t>::max()
,以便得到值的浮点表示(0.0
到1.0
,代表{{1}到0
)?
大于max的数字可以归结为未定义的行为,只要等于或小于max的每个数字被正确地划分为其浮点&#34;对应的&#34; (或浮点类型能够表示的最接近的数字而不是实数值)
我不确定将一个(或两个)方面投射到2^64-1
会产生所有有效输入的正确值,因为标准不保证long double
拥有尾数为64位。这有可能吗?
答案 0 :(得分:3)
不需要多精度算术。在使用小于64位的有效数(又名尾数)除以n max = std::numeric_limits<uint64_t>::max()
的浮点运算中,可以以精确舍入的方式计算(即计算结果与目标浮点格式中精确算术比的最接近近似相同,如下所示:
N / N <子>最大子> = n /(2 64 -1) = n / 2 64 /(1-2 -64 ) = n / 2 64 *(1 + 2 -64 +2 -128 + ...) = n / 2 64 + 任何不适合有效数字
因此结果是
n / n max = n / 2 64
以下C ++测试程序实现了计算比率 n / n max 的天真和准确方法:
#include <climits>
#include <cmath>
#include <iostream>
#include <limits>
#include <type_traits>
template<typename F, typename U>
F map_to_unit_range_naive(U n)
{
static_assert(std::is_floating_point<F>::value, "Result type must be a floating point type");
static_assert(std::is_unsigned<U>::value, "Input type must be an unsigned integer type");
return F(n)/F(std::numeric_limits<U>::max());
}
template<typename F, typename U>
F map_to_unit_range_accurate(U n)
{
static_assert(std::is_floating_point<F>::value, "Result type must be a floating point type");
static_assert(std::is_unsigned<U>::value, "Input type must be an unsigned integer type");
const int UBITS = sizeof(U) * CHAR_BIT;
return std::ldexp(F(n), -UBITS);
}
template<class F, class U>
double error_mapping_to_unit_range(U n)
{
const F r1 = map_to_unit_range_accurate<F>(n);
const F r2 = map_to_unit_range_naive<F>(n);
return (1-r2/r1);
}
#define CHECK_MAPPING_TO_UNIT_RANGE(n, result_type) \
std::cout << "map_to_unit_range<" #result_type ">(" #n "): err=" \
<< error_mapping_to_unit_range<result_type>(n)*100 << "%" \
<< std::endl;
int main()
{
CHECK_MAPPING_TO_UNIT_RANGE(123u, float);
CHECK_MAPPING_TO_UNIT_RANGE(123ul, float);
CHECK_MAPPING_TO_UNIT_RANGE(1234567890u, float);
CHECK_MAPPING_TO_UNIT_RANGE(1234567890ul, float);
std::cout << "\n";
CHECK_MAPPING_TO_UNIT_RANGE(123ul, double);
CHECK_MAPPING_TO_UNIT_RANGE(1234567890ul, double);
return 0;
}
该计划表明,天真的方法与精心设计的代码相同:
map_to_unit_range<float>(123u): err=0%
map_to_unit_range<float>(123ul): err=0%
map_to_unit_range<float>(1234567890u): err=0%
map_to_unit_range<float>(1234567890ul): err=0%
map_to_unit_range<double>(123ul): err=0%
map_to_unit_range<double>(1234567890ul): err=0%
这一开始可能看起来令人惊讶,但它有一个简单的解释 - 如果浮点类型不能精确地表示积分值2 N -1,那么它将它舍入为2 N ,有效地导致下一步的准确划分(根据上述公式)。
请注意,当浮点类型的精度超过整数类型的大小时(因此可以精确表示2 N -1),不满足公式的前提,并且“准确”的方法不再是这样:
int main()
{
CHECK_MAPPING_TO_UNIT_RANGE(123u, double);
CHECK_MAPPING_TO_UNIT_RANGE(1234567890u, double);
return 0;
}
输出:
map_to_unit_range<double>(123u): err=-2.32831e-08%
map_to_unit_range<double>(1234567890u): err=-2.32831e-08%
这里的“错误”来自“准确”的方法。
币:
非常感谢@interjay和@Jonathan Mee对此答案以前版本的全面同行评审。
答案 1 :(得分:1)
我认为最简单,最严格的便携式方式是boost::multiprecision::cpp_bin_float_quad
:
#include <boost/multiprecision/cpp_bin_float.hpp>
#include <limits>
#include <cstdint>
#include <iostream>
#include <iomanip>
int main()
{
using Float = boost::multiprecision::cpp_bin_float_quad;
for (std::uint64_t i = 0 ; i < 64 ; ++i)
{
auto v = std::uint64_t(1) << i;
auto x = Float(v);
x /= std::numeric_limits<std::uint64_t>::max();
// demonstrate lossless round-trip
auto y = x * std::numeric_limits<std::uint64_t>::max();
std::cout << std::setprecision(std::numeric_limits<Float>::digits10)
<< (x * 100) << "% : "
<< std::hex << y.convert_to<std::uint64_t>()
<< std::endl;
}
}
预期结果:
5.42101086242752217033113759205528e-18% : 1
1.08420217248550443406622751841106e-17% : 2
2.16840434497100886813245503682211e-17% : 4
4.33680868994201773626491007364422e-17% : 8
8.67361737988403547252982014728845e-17% : 10
1.73472347597680709450596402945769e-16% : 20
3.46944695195361418901192805891538e-16% : 40
6.93889390390722837802385611783076e-16% : 80
1.38777878078144567560477122356615e-15% : 100
2.7755575615628913512095424471323e-15% : 200
5.55111512312578270241908489426461e-15% : 400
1.11022302462515654048381697885292e-14% : 800
2.22044604925031308096763395770584e-14% : 1000
4.44089209850062616193526791541169e-14% : 2000
8.88178419700125232387053583082337e-14% : 4000
1.77635683940025046477410716616467e-13% : 8000
3.55271367880050092954821433232935e-13% : 10000
7.1054273576010018590964286646587e-13% : 20000
1.42108547152020037181928573293174e-12% : 40000
2.84217094304040074363857146586348e-12% : 80000
5.68434188608080148727714293172696e-12% : 100000
1.13686837721616029745542858634539e-11% : 200000
2.27373675443232059491085717269078e-11% : 400000
4.54747350886464118982171434538157e-11% : 800000
9.09494701772928237964342869076313e-11% : 1000000
1.81898940354585647592868573815263e-10% : 2000000
3.63797880709171295185737147630525e-10% : 4000000
7.27595761418342590371474295261051e-10% : 8000000
1.4551915228366851807429485905221e-09% : 10000000
2.9103830456733703614858971810442e-09% : 20000000
5.8207660913467407229717943620884e-09% : 40000000
1.16415321826934814459435887241768e-08% : 80000000
2.32830643653869628918871774483536e-08% : 100000000
4.65661287307739257837743548967072e-08% : 200000000
9.31322574615478515675487097934145e-08% : 400000000
1.86264514923095703135097419586829e-07% : 800000000
3.72529029846191406270194839173658e-07% : 1000000000
7.45058059692382812540389678347316e-07% : 2000000000
1.49011611938476562508077935669463e-06% : 4000000000
2.98023223876953125016155871338926e-06% : 8000000000
5.96046447753906250032311742677853e-06% : 10000000000
1.19209289550781250006462348535571e-05% : 20000000000
2.38418579101562500012924697071141e-05% : 40000000000
4.76837158203125000025849394142282e-05% : 80000000000
9.53674316406250000051698788284564e-05% : 100000000000
0.000190734863281250000010339757656913% : 200000000000
0.000381469726562500000020679515313826% : 400000000000
0.000762939453125000000041359030627651% : 800000000000
0.0015258789062500000000827180612553% : 1000000000000
0.00305175781250000000016543612251061% : 2000000000000
0.00610351562500000000033087224502121% : 4000000000000
0.0122070312500000000006617444900424% : 8000000000000
0.0244140625000000000013234889800848% : 10000000000000
0.0488281250000000000026469779601697% : 20000000000000
0.0976562500000000000052939559203394% : 40000000000000
0.195312500000000000010587911840679% : 80000000000000
0.390625000000000000021175823681358% : 100000000000000
0.781250000000000000042351647362715% : 200000000000000
1.56250000000000000008470329472543% : 400000000000000
3.12500000000000000016940658945086% : 800000000000000
6.25000000000000000033881317890172% : 1000000000000000
12.5000000000000000006776263578034% : 2000000000000000
25.0000000000000000013552527156069% : 4000000000000000
50.0000000000000000027105054312138% : 8000000000000000
使用boost::multiprecision::float128
可以获得更好的性能但是它只适用于gcc(指定-std = g ++ NN)或英特尔编译器。
答案 2 :(得分:1)
我会从你的问题中暗示:
我不确定将一个(或两个)方面投射到
long double
会产生所有有效输入的正确值,因为标准不保证long double
拥有尾数为64位。这有可能吗?
你要问的是:
uint64_t
可以表示的任何值是否可以在被转换为long double
的尾数并回到uint64_t
之后存活?答案取决于实施。关键在于long double
用于它的尾数的位数。幸运的是,C ++ 11为您提供了一种方法:numeric_limits<long double>::digits
例如:
const auto ui64max = numeric_limits<uint64_t>::max();
const auto foo = ui64max - 1;
const auto bar = static_cast<long double>(foo) / ui64max;
cout << "Max Digits For Roundtrip Guarantee: " << numeric_limits<long double>::digits << "\nMax Digits In uint64_t: " << numeric_limits<uint64_t>::digits << "\nConverting: " << foo << "\nTo long double Mantissa: " << bar << "\nRoundtrip Back To uint64_t: " << static_cast<uint64_t>(bar * ui64max) << endl;
您可以在编译时使用以下内容验证此事实:
static_assert(numeric_limits<long double>::digits >= numeric_limits<uint64_t>::digits, "long double has insufficient mantissa precision in this implementation");
有关支持往返问题的数学的更多信息,请参阅此处:Float Fractional Precision