Question

通过银行家的舍入我的意思是

“舍入到最近，与...连接”

舍入到最接近的值;如果数字中间下降，则使用偶数（零）最低有效位舍入到最接近的值。这是二进制浮点的默认值，也是十进制的推荐默认值。

据说这种方法优于

“舍入到最近，与零相关”

on the grounds that它“在对四舍五入的数字求和时最小化预期误差”。显然这是because“它不会像在大多数合理分布上的零方法那样偏离负半周或正偏差”。

我不明白为什么会这样。直观地，如果0.0向零舍入，0.5“应该”从零舍入（如方法2中所示）。这样，相同数量的数字将朝向零并且远离零。换句话说，如果浮动数字用1个十进制数字表示，则十个数字0.0，...，0.9五个将向下舍入，五个将用方法2四舍五入。同样适用于1.0，...，1.9等

当然浮点数用二进制尾数表示，但我认为上述推理仍然适用。请注意，对于IEEE 754双精度，整数和整数加半精度可以精确表示大约2^52的绝对值，因此这些精确值实际上会显示出来。

那么方法1如何更好？

Answer 1

是!它确实在数值上更稳定。

对于您正在查看的数字[0.0, 0.1, ..., 0.9]，请注意，在圆形关系下，只有四个这些数字正在向下舍入（ 0.1到0.4），五个被四舍五入，一个（0.0）由舍入操作保持不变，当然这个模式会重复1.0到{{ 1}}，1.9到2.0等等。因此，平均而言，更多的值从零开始舍入而不是朝向它。但在圆形关系下，我们得到：

在2.9
四个值向下舍入，五个向上舍入[0.0, 0.9]

等等。平均而言，我们得到的数值相同，即四舍五入。更重要的是，舍入引入的预期误差（在对输入分布的适当假设下）接近于零。

这是使用Python的快速演示。为了避免由于内置[1.0, 1.9]函数中的Python 2 / Python 3差异而导致的困难，我们提供了两个与Python版本无关的舍入函数：

round

现在我们来看看在def round_ties_to_even(x): """ Round a float x to the nearest integer, rounding ties to even. """ if x < 0: return -round_ties_to_even(-x) # use symmetry int_part, frac_part = divmod(x, 1) return int(int_part) + ( frac_part > 0.5 or (frac_part == 0.5 and int_part % 2.0 == 1.0)) def round_ties_away_from_zero(x): """ Round a float x to the nearest integer, rounding ties away from zero. """ if x < 0: return -round_ties_away_from_zero(-x) # use symmetry int_part, frac_part = divmod(x, 1) return int(int_part) + (frac_part >= 0.5)范围内的一位数后点小数值上应用这两个函数所引入的平均误差：

[50.0, 100.0]

我们使用最近添加的>>> test_values = [n / 10.0 for n in range(500, 1001)] >>> errors_even = [round_ties_to_even(value) - value for value in test_values] >>> errors_away = [round_ties_away_from_zero(value) - value for value in test_values]标准库模块来计算这些错误的均值和标准差：

statistics

这里的关键点是>>> import statistics >>> statistics.mean(errors_even), statistics.stdev(errors_even) (0.0, 0.2915475947422656) >>> statistics.mean(errors_away), statistics.stdev(errors_away) (0.0499001996007984, 0.28723681870533313)具有零均值：平均误差为零。但errors_even具有正平均值：平均误差偏离零。

一个更现实的例子

这是一个半现实的例子，它展示了数值算法中从零到零的偏差。我们将使用pairwise summation算法计算浮点数列表的总和。该算法将要计算的总和分成两个大致相等的部分，递归地对这两个部分求和，然后将结果相加。它比天真的总和更准确，但通常不如Kahan summation等更复杂的算法那么好。它是NumPy errors_away函数使用的算法。这是一个简单的Python实现。

sum

我们在上面的函数中包含了一个参数import operator def pairwise_sum(xs, i, j, add=operator.add): """ Return the sum of floats xs[i:j] (0 <= i <= j <= len(xs)), using pairwise summation. """ count = j - i if count >= 2: k = (i + j) // 2 return add(pairwise_sum(xs, i, k, add), pairwise_sum(xs, k, j, add)) elif count == 1: return xs[i] else: # count == 0 return 0.0，表示要用于添加的操作。默认情况下，它使用Python的常规加法算法，在典型的机器上将使用round-ties-to-even舍入模式解析为标准的IEEE 754加法。

我们希望查看来自add函数的预期错误，同时使用标准加法和使用Round-ties-from-zero版本的加法。我们的第一个问题是，我们没有一种简单易用的方法来改变Python内部硬件的舍入模式，二进制浮点的软件实现会很大而且很慢。幸运的是，我们可以使用这个技巧在仍然使用硬件浮点的同时获得零距离。对于这个技巧的第一部分，我们可以使用Knuth＆＃34; 2Sum＆＃34;算法添加两个浮点数并获得正确舍入的和以及该总和中的精确错误：

pairwise_sum

有了这个，我们可以轻松地使用误差项来确定精确总和何时达到平局。当且仅当def exact_add(a, b): """ Add floats a and b, giving a correctly rounded sum and exact error. Mathematically, a + b is exactly equal to sum + error. """ # This is Knuth's 2Sum algorithm. See section 4.3.2 of the Handbook # of Floating-Point Arithmetic for exposition and proof. sum = a + b bv = sum - a error = (a - (sum - bv)) + (b - bv) return sum, error非零并且error完全可以表示时，我们才会有一个平局，在这种情况下，sum + 2*error和sum是最接近该领带的两个浮点数。使用这个想法，这里有一个函数可以添加两个数字并给出一个正确的舍入结果，但是将远离从零开始。

sum + 2*error

现在我们可以比较结果。 def add_ties_away(a, b): """ Return the sum of a and b. Ties are rounded away from zero. """ sum, error = exact_add(a, b) sum2, error2 = exact_add(sum, 2.0*error) if error2 or not error: # Not a tie. return sum else: # Tie. Choose the larger of sum and sum2 in absolute value. return max([sum, sum2], key=abs)是一个函数，它生成[1,2]范围内的浮点数列表，使用正常的round-ties-to-even加法和我们的自定义round-ties-from-zero版本添加它们，与完全总和进行比较，并返回两个版本的错误，以最后一个位置为单位进行测量。

sample_sum_errors

以下是一个示例运行：

import fractions
import random

def sample_sum_errors(sample_size=1024):
    """
    Generate `sample_size` floats in the range [1.0, 2.0], sum
    using both addition methods, and return the two errors in ulps.
    """
    xs = [random.uniform(1.0, 2.0) for _ in range(sample_size)]
    to_even_sum = pairwise_sum(xs, 0, len(xs))
    to_away_sum = pairwise_sum(xs, 0, len(xs), add=add_ties_away)

    # Assuming IEEE 754, each value in xs becomes an integer when
    # scaled by 2**52; use this to compute an exact sum as a Fraction.
    common_denominator = 2**52
    exact_sum = fractions.Fraction(
        sum(int(m*common_denominator) for m in xs),
        common_denominator)

    # Result will be in [1024, 2048]; 1 ulp in this range is 2**-44.
    ulp = 2**-44
    to_even_error = (fractions.Fraction(to_even_sum) - exact_sum) / ulp
    to_away_error = (fractions.Fraction(to_away_sum) - exact_sum) / ulp

    return to_even_error, to_away_error

因此，使用标准加法会产生1.6 ulps的误差，并且当将关系从零处舍入时，误差为9.6 ulps。它肯定看起来好像从零开始的方法更糟糕，但单次运行并不是特别有说服力。让我们这样做10000次，每次使用不同的随机样本，并绘制我们得到的错误。这是代码：

>>> sample_sum_errors()
(1.6015625, 9.6015625)

当我在我的机器上运行上述功能时，我看到：

import statistics
import numpy as np
import matplotlib.pyplot as plt

def show_error_distributions():
    errors = [sample_sum_errors() for _ in range(10000)]
    to_even_errors, to_away_errors = zip(*errors)
    print("Errors from ties-to-even: "
          "mean {:.2f} ulps, stdev {:.2f} ulps".format(
              statistics.mean(to_even_errors),
              statistics.stdev(to_even_errors)))
    print("Errors from ties-away-from-zero: "
          "mean {:.2f} ulps, stdev {:.2f} ulps".format(
              statistics.mean(to_away_errors),
              statistics.stdev(to_away_errors)))

    ax1 = plt.subplot(2, 1, 1)
    plt.hist(to_even_errors, bins=np.arange(-7, 17, 0.5))
    ax2 = plt.subplot(2, 1, 2)
    plt.hist(to_away_errors, bins=np.arange(-7, 17, 0.5))
    ax1.set_title("Errors from ties-to-even (ulps)")
    ax2.set_title("Errors from ties-away-from-zero (ulps)")
    ax1.xaxis.set_visible(False)
    plt.show()

我得到以下情节：

我计划更进一步，对两个样本的偏差进行统计检验，但是从零关系方法的偏差是如此明显，以至于看起来没必要。有趣的是，虽然从零开始的方法给出了较差的结果，但会给出较小的错误分布。

“银行家的四舍五入”真的在数值上更稳定吗？

1 个答案:

一个更现实的例子