鉴于两个双打,如何确定他们的商是否准确?

时间:2017-06-21 07:32:40

标签: java math floating-point ieee-754

给定两个double值,pq,如何判断他们的商数是否

double result = p / q;
根据{{​​1}}和p的二进制值,

完全结果?

q是否完全等于resultp的数学除法。

显然,某些值(例如q)和1.0 / 2.0等其他值都是如此,因此我正在寻找一种惯用且准确的方法来分隔案例。

似乎浮点模数1.0 / 5.0 可能有效,但我不确定!

3 个答案:

答案 0 :(得分:4)

您可以使用BigDecimal查看该分部是否准确:

private static boolean canDivideExact(double p, double q) {
  double r = p / q;
  BigDecimal d = new BigDecimal(r);
  return d.multiply(new BigDecimal(q)).compareTo(new BigDecimal(p)) == 0;
}

例如:

System.out.println(canDivideExact(1, 2)); //true
System.out.println(canDivideExact(1, 3)); //false

答案 1 :(得分:3)

以下两种不涉及使用BigDecimal的方法。如上所述,它们都不适用于次正规,但如果你能避免次正规,下溢和溢出,两者都应该给出好的结果。我希望这两种方法都可以适用于低于正常的情况,但我还没有想过如何这样做。

  1. 任何有限的非零double x都可以x = m 2^e形式唯一地写入整数em m 。我们致电m x奇数部分。现在给出两个非零有限双精度xy,并假设避免了溢出和下溢,x / y当前且仅当x的奇数部分是一个时才能准确表示y奇数部分的整数倍。我们可以使用%检查整数倍条件,因此剩下的就是找到计算奇数部分的方法。在C或Python中,我使用frexp,扔掉指数,然后反复将该分数乘以2直到它是一个整数,但frexp不会出现以Java形式提供。但是,Java确实有Math.getExponent,它将提供frexp的指数部分,然后Math.scalb可用于获取分数。

  2. 在计算x / y并获得(可能已舍入)结果z后,您可以使用双倍算术将y乘以z(通过Veltkamp拆分)和Dekker乘法),并检查结果是否完全等于x。这应该比使用BigDecimal的等效方法更有效,因为我们事先知道我们不需要超过通常的浮点精度的两倍来包含结果。

  3. 我担心我在Java中不能流利地提供代码,但是在Python中的代码应该可以直接适应Java。 (请注意,Python的float类型与典型计算机上的Java double匹配;理论上,Python并不需要IEEE 754,但实际上它并不需要IEEE 754。几乎不可避免的是,Python float格式将是IEEE 754 binary64。)

    如果有人想窃取此代码,将其转换为Java,并将其转换为答案,我将很乐意提升。

    import math
    
    def odd_part(x):
        """
        Return an odd integer m (as a double) such that x can be written
        in the form m * 2**e for some exponent e. The exponent e is not
        returned.
        """
        fraction, exponent = math.frexp(x)
    
        # here fraction * 2**53 is guaranteed to be an integer, so we
        # don't need to loop more than 53 times.
        while fraction % 1.0 != 0.0:  # or in Python, use the is_integer method.
            fraction *= 2.0
        return fraction
    
    
    # Constant used in Veltkamp splitting.
    C = float.fromhex('0x1.0000002000000p+27')
    
    def split(x):
        """
        Split a double x into pieces x_hi, x_lo, each
        expressible with 26 bits of precision.
    
        Algorithm due to Veltkamp.
    
        Parameters
        ----------
        x : float
            Finite float, such that C*x does not overflow. Assumes IEEE 754
            representation and arithmetic, with round-ties-to-even rounding
            mode.
    
        Returns
        -------
        l, h : float
            l and h are both representable in 26 bits of precision, and
            x = l + h.
    
        """
        # Idea of proof: without loss of generality, we can reduce to the case
        # where 0.5 < x < 1 (the case where x is a power of 2 is straightforward).
        # Write rnd for the floating-point rounding operation, so p = rnd(Cx) and q
        # = rnd(x-p).
        #
        # Now let e and f be the errors induced by the floating-point operations,
        # so
        #     p = Cx + e
        #     q = x - p + f
        #
        # Then it's easy to show that:
        #
        #  2**26 < |Cx| < 2**28, so p is a multiple of 2**-26 and |e| <= 2**-26.
        #  2**26 <= p - x <= 2**27, so q is a multiple of 2**-26 and |f| <= 2**-27.
        #  h = p + q is exactly representable, equal to x + f
        #  h <= 1, and h is a multiple of 2**-26, so h has precision <= 26.
        #  l = x - h is exactly representable, equal to f.
        #  |f| <= 2**-27, and f is a multiple of 2**-53, so f has precision <= 26.
    
        p = C * x
        q = x - p
        h = p + q
        l = x - h
        return l, h
    
    
    def exact_mult(x, y):
        """
        Multiply floats x and y exactly, expressing the result as l + h,
        where h is the closest float to x * y and l is the error.
    
        Algorithm is due to Dekker.
    
        Assumes that x and y are finite IEEE 754 binary64 floats.
    
        May return inf or nan due to intermediate overflow.
    
        May raise ValueError on underflow or near-underflow.
    
        If both results are finite, then we have equality:
    
           x * y = h + l
    
        """
        # Write |x| = M * 2**e, y = |N| * 2**f, for some M and N with
        # M, N <= 2**53 - 1. Then xy = M*N*2**(e+f). If e + f <= -1075
        # then xy < (2**53 - 1)**2 * 2**-1075 < 2**-969 (1 - 2**-53),
        # which is exactly representable.
        # Hence the rounded value of |xy| is also < 2**-969.
    
        # So if |xy| >= 2**-969, and |xy| isn't near overflow, it follows that x*y
        # *can* be expressed as the sum of two doubles: 
    
        # If |xy| < 2**-969, we can't guarantee it, and we raise ValueError.
    
        h = x * y
    
        if abs(h) < 2**-969 and x != 0.0 and y != 0.0:
            raise ValueError("Cannot guarantee exact result.")
    
        xl, xh = split(x)
        yl, yh = split(y)
        return -h + xh * yh + xh * yl + xl * yh + xl * yl, h
    
    
    def division_exact_method_1(x, y):
        """
        Given nonzero finite not-too-large not-too-small floats x and y,
        return True if x / y is exactly representable, else False.
        """
        return odd_part(x) % odd_part(y) == 0
    
    
    def division_exact_method_2(x, y):
        """
        Given nonzero finite not-too-large not-too-small floats x and y,
        return True if x / y is exactly representable, else False.
        """
        z = x / y
        low, high = exact_mult(y, z)
        return high == x and low == 0
    

答案 2 :(得分:1)

我认为您的update t1 set t2_a = t2.a from t2 where t1.b = t2.b returning t1.*; a | b | t2_a ---+----+------ 1 | 10 | 5 2 | 11 | 1 3 | 12 | 2 4 | 14 | 6 5 | 16 | 4 (5 rows) 解决方案不起作用:它会检查p % q == 0是否均匀除以p,即{{1}是一个整数。例如,q,虽然它可以精确地表示为双精度:p / q

有趣的是,IEEE 754也是Java浮点实现的基础,它正是您所需要的。如果浮点运算产生不精确的结果,则会引发异常(在IEEE意义上),默认情况下会更新状态字,因此您可以通过检查此状态字来检查结果是否准确。不幸的是,Java选择不使该状态可访问。

如果你想继续使用Java,你必须使用assylia的基于1.0 % 2.0 == 1.0的解决方案,或者尝试通过JNI访问这些错误标志:在C(C99以后)你可以测试结果是否是与1.0 / 2.0 == 0.5完全相同。不过,我不知道这是否会奏效。