Question

我有一个复杂的程序，它使用std::array<double, N>表示N的小值。它使用operator[]从这些数组中获取值。

我发现带有-O2或-O3的GCC 6.1没有内联这些调用，导致这些C ++数组比它们的C等效速度慢。

这是生成的程序集：

340 <std::array<double, 8ul>::operator[](unsigned long) const>:

340:  48 8d 04 f7             lea    (%rdi,%rsi,8),%rax
344:  c3                      retq   
345:  90                      nop
346:  66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
34d:  00 00 00

为每个大小的数组发出相同的代码（因为没有边界检查）。

这样一个数组的循环如下所示：

4c0:  e8 7b fe ff ff          callq  340 <std::array<double, 8ul>::operator[](unsigned long) const>
4c5:  be 07 00 00 00          mov    $0x7,%esi
4ca:  4c 89 f7                mov    %r14,%rdi
4cd:  48 89 44 24 78          mov    %rax,0x78(%rsp)

...6 more copies of this...

4d2:  e8 69 fe ff ff          callq  340 <std::array<double, 8ul>::operator[](unsigned long) const>
4d7:  48 89 44 24 70          mov    %rax,0x70(%rsp)
4dc:  31 f6                   xor    %esi,%esi
4de:  4c 89 ef                mov    %r13,%rdi

这显然很糟糕。问题是小型测试程序不会引起这种行为。

所以我的问题是：我怎样才能让GCC告诉我为什么它不会内联这些单指令调用，和/或让它内联呢？显然，我无法修改<array>标头文件以添加__attribute__((inline))。

Answer 1

GCC 5和6的优化器中似乎存在一个错误，在将__attribute__(("unroll-loops"))与-ffast-math或相关选项结合使用时会出现错误。

您可以在此处看到它：Date and Time Properties

如果使用-O3 -ffast-math进行编译，则此代码会重现错误：

#include <array>

typedef std::array<double, 2> Array;

void foo(Array& a) __attribute__((optimize("unroll-loops")));

void foo(Array& a)
{
  for (size_t ii = 0; ii < a.size(); ++ii)
    a[ii] = 1.0;
}

如果没有-ffast-math，或者使用GCC 4.9，GCC 7或更高版本或Clang进行编译，它的工作正常。

GCC有时不会内联std :: array :: operator []

1 个答案: