Question

我目前正在研究用于数组求和的稳健方法，并实施了Shewchuk在"Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates"中发布的算法。虽然实现的算法在gfortran中按预期工作，但ifort可以优化对策。

为了给出一些上下文，这是我的代码：

module test_mod
contains
  function shewchukSum( array ) result(res)
    implicit none
    real,intent(in) :: array(:)
    real            :: res
    integer         :: xIdx, yIdx, i, nPartials
    real            :: partials(100), hi, lo, x, y

    nPartials = 0
    do xIdx=1,size(array)
      i = 0
      x = array(xIdx)

      ! Calculate the partial sums
      do yIdx=1,nPartials
        y = partials(yIdx)
        hi = x + y
        if ( abs(x) < abs(y) ) then
          lo = x - (hi - y)
        else
          lo = y - (hi - x)
        endif
        x = hi

        ! If a round-off error occured, store it. Exact comparison intended
        if ( lo == 0. ) cycle
        i = i + 1 ; partials(i) = lo
      enddo ! yIdx
      nPartials = i + 1 ; partials( nPartials ) = x
    enddo ! xIdx

    res = sum( partials(:nPartials) )
  end function
end module

调用测试程序是

program test
  use test_mod
  implicit none
  print *,        sum([1.e0, 1.e16, 1.e0, -1.e16])
  print *,shewchukSum([1.e0, 1.e16, 1.e0, -1.e16])
end program

使用gfortran进行编译会为所有优化级别生成正确的结果：

./a.out 
   0.00000000    
   2.00000000

但是，

ifort会为-O0以上的所有优化生成零：

./a.out 
   0.00000000
   0.00000000

我尝试调试代码并进入汇编级别，并发现ifort正在优化lo的计算以及if ( lo == 0. ) cycle之后的操作。

是否有可能强制ifort执行所有优化级别的完整操作？这个添加是计算的关键部分，我希望它尽可能快地运行。为了进行比较，gfortran -O2执行此代码的速度比ifort的{{1}}大约快8到10倍（对于长度> 100k的数组进行测量）。

Answer 1

当谈到浮点运算时，ifort的默认值通常用于性能而不是严格的正确性。

有许多选项可以控制浮点行为。使用ifort 16和选项-assume protect_parens即使在更高的优化级别，我也能获得预期的行为。

此外，还有一些选项-fp-model precise -fp-model source（后者暗示-assume protect_parens也可能对您感兴趣。-fp-model的默认值为fast=1

允许值不安全的优化

当然，这些可能会对性能产生影响，因此围绕浮点行为的其他选择也值得考虑。

在Intel publication中可以找到更多细节。

优化会破坏稳健性措施

1 个答案: