Question

当我使用gcc 6 -O3 -std=c++14编译以下代码时，我变得很好并且空main：

Dump of assembler code for function main():
   0x00000000004003e0 <+0>:     xor    %eax,%eax
   0x00000000004003e2 <+2>:     retq

但是在主要的“中断”优化中取消注释最后一行：

Dump of assembler code for function main():
   0x00000000004005f0 <+0>:     sub    $0x78,%rsp
   0x00000000004005f4 <+4>:     lea    0x40(%rsp),%rdi
   0x00000000004005f9 <+9>:     movq   $0x400838,0x10(%rsp)
   0x0000000000400602 <+18>:    movb   $0x0,0x18(%rsp)
   0x0000000000400607 <+23>:    mov    %fs:0x28,%rax
   0x0000000000400610 <+32>:    mov    %rax,0x68(%rsp)
   0x0000000000400615 <+37>:    xor    %eax,%eax
   0x0000000000400617 <+39>:    movl   $0x0,(%rsp)
   0x000000000040061e <+46>:    movq   $0x400838,0x30(%rsp)
   0x0000000000400627 <+55>:    movb   $0x0,0x38(%rsp)
   0x000000000040062c <+60>:    movl   $0x0,0x20(%rsp)
   0x0000000000400634 <+68>:    movq   $0x400838,0x50(%rsp)
   0x000000000040063d <+77>:    movb   $0x0,0x58(%rsp)
   0x0000000000400642 <+82>:    movl   $0x0,0x40(%rsp)
   0x000000000040064a <+90>:    callq  0x400790 <ErasedObject::~ErasedObject()>
   0x000000000040064f <+95>:    lea    0x20(%rsp),%rdi
   0x0000000000400654 <+100>:   callq  0x400790 <ErasedObject::~ErasedObject()>
   0x0000000000400659 <+105>:   mov    %rsp,%rdi
   0x000000000040065c <+108>:   callq  0x400790 <ErasedObject::~ErasedObject()>
   0x0000000000400661 <+113>:   mov    0x68(%rsp),%rdx
   0x0000000000400666 <+118>:   xor    %fs:0x28,%rdx
   0x000000000040066f <+127>:   jne    0x400678 <main()+136>
   0x0000000000400671 <+129>:   xor    %eax,%eax
   0x0000000000400673 <+131>:   add    $0x78,%rsp
   0x0000000000400677 <+135>:   retq   
   0x0000000000400678 <+136>:   callq  0x4005c0 <__stack_chk_fail@plt>

代码

#include <type_traits>
#include <new>

namespace
{
struct ErasedTypeVTable
{
   using destructor_t = void (*)(void *obj);

   destructor_t dtor;
};

template <typename T>
void dtor(void *obj)
{
   return static_cast<T *>(obj)->~T();
}

template <typename T>
static const ErasedTypeVTable erasedTypeVTable = {
   &dtor<T>
};
}

struct ErasedObject
{
   std::aligned_storage<sizeof(void *)>::type storage;
   const ErasedTypeVTable& vtbl;
   bool flag = false;

   template <typename T, typename S = typename std::decay<T>::type>
   ErasedObject(T&& obj)
   : vtbl(erasedTypeVTable<S>)
   {
      static_assert(sizeof(T) <= sizeof(storage) && alignof(T) <= alignof(decltype(storage)), "");
      new (object()) S(std::forward<T>(obj));
   }

   ErasedObject(ErasedObject&& other) = default;

   ~ErasedObject()
   {
      if (flag)
      {
         ::operator delete(object());
      }
      else
      {
         vtbl.dtor(object());
      }
   }

   void *object()
   {
      return reinterpret_cast<char *>(&storage);
   }
};

struct myType
{
   int a;
};

int main()
{
   ErasedObject c1(myType{});
   ErasedObject c2(myType{});
   //ErasedObject c3(myType{});
}

clang可以优化两个版本。

任何想法发生了什么？我达到了一些优化限制吗？如果是这样，它是可配置的吗？

Answer 1

我使用g++运行-fdump-ipa-inline以获取有关函数内联或未内联的更多信息。

对于带有main（）函数的测试用例和创建的三个对象，我得到了：

  (...)
  150 Deciding on inlining of small functions.  Starting with size 35.
  151 Enqueueing calls in void {anonymous}::dtor(void*) [with T = myType]/40.
  152 Enqueueing calls in int main()/35.
  153   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  154   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  155   not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
  (...)

此错误代码在gcc / gcc / ipa-inline.c中设置：

  else if (!e->maybe_hot_p ()
       && (growth >= MAX_INLINE_INSNS_SINGLE
       || growth_likely_positive (callee, growth)))
{
      e->inline_failed = CIF_UNLIKELY_CALL;
      want_inline = false;
}

然后我发现，使g ++内联这些函数的最小变化是添加一个声明：

int main() __attribute__((hot));

我无法在代码中找到为什么int main()不被认为是热门的，但可能这应留给另一个问题。

更有趣的是上面粘贴的条件的第二部分。目的是在代码增长时不内联，并在代码在完成内联后收缩时生成示例。

我认为这值得在GCC's bugzilla上报告，但我不确定你是否可以将其称为错误 - 内联影响的估计是一种启发式方法，因此它可以在大多数情况下，不是全部。

这是gcc优化器中的错误吗？

1 个答案: