Question

考虑以下代码：

// foo.cxx
int last;

int next() {
  return ++last;
}

int index(int scale) {
  return next() << scale;
}

使用gcc 7.2编译时：

$ g++ -std=c++11 -O3 -fPIC

这会发出：

next():
    movq    last@GOTPCREL(%rip), %rdx
    movl    (%rdx), %eax
    addl    $1, %eax
    movl    %eax, (%rdx)
    ret
index(int):
    pushq   %rbx
    movl    %edi, %ebx
    call    next()@PLT    ## next() not inlined, call through PLT
    movl    %ebx, %ecx
    sall    %cl, %eax
    popq    %rbx
    ret

但是，当使用clang 3.9编译相同的代码而不是：

next():                               # @next()
    movq    last@GOTPCREL(%rip), %rcx
    movl    (%rcx), %eax
    incl    %eax
    movl    %eax, (%rcx)
    retq

index(int):                              # @index(int)
    movq    last@GOTPCREL(%rip), %rcx
    movl    (%rcx), %eax
    incl    %eax              ## next() was inlined!
    movl    %eax, (%rcx)
    movl    %edi, %ecx
    shll    %cl, %eax
    retq

gcc通过PLT调用next()，clang内联它。两者仍然从GOT中查找last。对于在linux上进行编译，是否正确地进行优化并且gcc在简单内联中丢失，或者是否在进行优化时出现错误，或者这纯粹是QoI问题？

Answer 1

我不认为标准会涉及到那么多细节。它只是说，如果符号在不同的翻译单元中具有外部链接，则大致相同，它是相同的符号。这使得clang的版本正确。

从那时起，据我所知，我们已超出标准。编译器的选择因他们认为有用的-fPIC输出而不同。

请注意g++ -c -std=c++11 -O3 -fPIE输出：

0000000000000000 <_Z4nextv>:
   0:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 6 <_Z4nextv+0x6>
   6:   83 c0 01                add    $0x1,%eax
   9:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # f <_Z4nextv+0xf>
   f:   c3                      retq   

0000000000000010 <_Z5indexi>:
  10:   8b 05 00 00 00 00       mov    0x0(%rip),%eax        # 16 <_Z5indexi+0x6>
  16:   89 f9                   mov    %edi,%ecx
  18:   83 c0 01                add    $0x1,%eax
  1b:   89 05 00 00 00 00       mov    %eax,0x0(%rip)        # 21 <_Z5indexi+0x11>
  21:   d3 e0                   shl    %cl,%eax
  23:   c3                      retq

所以GCC 确实知道如何优化它。它只是选择不使用-fPIC时。但为什么？我只能看到一个解释：可以在动态链接期间覆盖符号，并一致地查看效果。该技术称为symbol interposition。

在共享库中，如果index调用next，next全局可见，则gcc必须考虑next可能插入的可能性。所以它使用PLT。但是，当使用-fPIE时，不允许插入符号，因此gcc会启用优化。

铿锵有错吗？不。但gcc似乎为符号插入提供了更好的支持，这对于检测代码很方便。如果使用-fPIC代替-fPIE来构建其可执行文件，则会以一些开销为代价。

附加说明：

在一位gcc开发人员的this blog entry中，他提到，在帖子的末尾：

在将一些基准与clang进行比较时，我注意到clang实际上忽略了ELF插入规则。虽然它是bug，但我决定向GCC添加-fno-semantic-interposition标志以获得类似的行为。如果不希望插入，ELF的官方答案是使用隐藏的可见性，如果需要导出符号，则定义别名。这并不总是实际可行的。

在此之后，我找到了x86-64 ABI spec。在3.5.5节中，它确实要求所有调用全局可见符号的函数必须通过PLT（它根据内存模型定义要使用的确切指令序列）。

因此，虽然它没有违反C ++标准，但忽略语义插入似乎违反了ABI。

最后一句话：不知道把它放在哪里，但你可能会感兴趣。我将为您节省转储，但我使用objdump和编译器选项进行的测试表明：

在gcc方面：

gcc -fPIC：对last的访问通过GOT进行，对next()的通话通过PLT。
gcc -fPIC -fno-semantic-interposition ：last通过GOT，内联next()。
gcc -fPIE： last与IP相关，内联next()。
-fPIE隐含-fno-semantic-interposition

在事情的一面：

clang -fPIC： last通过GOT，内联next()。
clang -fPIE： last通过GOT，内联next()。

在两个编译器上内联编译为IP相对的修改版本：

// foo.cxx
int last_ __attribute__((visibility("hidden")));
extern int last __attribute__((alias("last_")));

int __attribute__((visibility("hidden"))) next_()
{
  return ++last_;
}
// This one is ugly, because alias needs the mangled name. Could extern "C" next_ instead.
extern int next() __attribute__((alias("_Z5next_v")));

int index(int scale) {
  return next_() << scale;
}

基本上，这明确标志着尽管全局都可以使用它们，但我们使用那些忽略任何类型插入的符号的隐藏版本。无论传递的选项如何，两个编译器都会完全优化访问。

gcc vs clang：使用-fPIC内联函数

1 个答案: