Question

我有一个非常长的（迭代次数）for循环，我喜欢使其个性化的一些部分成为可能。代码如下所示：

function expensive_loop( void (*do_true)(int),  void (*do_false)(int)){
    for(i=0; i<VeryLargeN; i++){
       element=elements[i]
       // long computation that produce a boolean condition
       if (condition){ 
         do_true(element); 
       }else{
         do_false(element);
       }
    }
}

现在，问题在于，每次调用do_true和do_false时，由于堆栈的推送/弹出会破坏代码的高性能，因此会产生开销。

要解决这个问题，我可以简单地创建expensive_loop函数的多个副本，每个副本都有自己的do_true和do_false实现。这将使代码无法实现。

那么，有人如何制作迭代的内部部分，以便它可以个性化，并且仍然保持高性能？

Answer 1

请注意，该函数接受指向函数的指针，因此通过指针调用它们。如果expensive_loop的定义和那些函数可用且编译器内联限制未被破坏，优化器可以通过函数指针内联这些调用。

另一个选择是使这个算法成为一个函数模板，它接受可调用对象（函数指针，带调用运算符的对象，lambdas），就像标准算法一样。这样编译器可能有更多的优化机会。 E.g：

template<class DoTrue, class DoFalse>
void expensive_loop(DoTrue do_true, DoFalse do_false) { 
    // Original function body here.
}

g++的{{3}}编译器开关：

-Winline

如果函数无法内联并且声明为内联函数，则发出警告。即使使用此选项，编译器也不会警告系统头中声明的内联函数失败。

编译器使用各种启发式方法来确定是否内联函数。例如，编译器会考虑内联函数的大小以及当前函数中已经完成的内联量。因此，源程序中看似无关紧要的更改可能会导致-Winline产生的警告出现或消失。

当通过指针调用函数时，它可能不会警告函数没有内联。

Answer 2

问题是函数地址（do_true和do_false中实际设置的内容在链接时间之前没有得到解决，因为优化机会不多。

如果要在代码中明确设置两个函数（即函数本身不是来自外部库等），则可以使用C ++模板声明函数，以便编译器确切地知道哪个函数你想在那时打电话的功能。

struct function_one {
  void operator()( int element ) {
  }
};

extern int elements[];
extern bool condition();

template < typename DoTrue, typename DoFalse >
void expensive_loop(){
  DoTrue do_true;
  DoFalse do_false;

  for(int i=0; i<50; i++){
    int element=elements[i];
    // long computation that produce a boolean condition
    if (condition()){ 
      do_true(element); // call DoTrue's operator()
    }else{
      do_false(element); // call DoFalse's operator()
    }
  }
}

int main( int argc, char* argv[] ) {
    expensive_loop<function_one,function_one>();

return 0;
}

编译器将为您指定的DoTrue和DoFalse类型的每个组合实例化expensive_loop函数。如果您使用多个组合，它将增加可执行文件的大小，但每个组合都应该按照您的预期进行。

对于我显示的示例，请注意该函数是如何为空。编译器只是删除函数调用并离开循环：

main:
    push    rbx
    mov     ebx, 50
.L2:
    call    condition()
    sub     ebx, 1
    jne     .L2
    xor     eax, eax
    pop     rbx
    ret

请参阅https://godbolt.org/g/hV52Nn

中的示例

在示例中使用函数指针，可能无法内联函数调用。这是在main

的程序中为expensive_loop和expensive_loop生成的汇编程序

// File A.cpp
void foo( int arg );
void bar( int arg );

extern bool condition();
extern int elements[];

void expensive_loop( void (*do_true)(int),  void (*do_false)(int)){
    for(int i=0; i<50; i++){
       int element=elements[i];
       // long computation that produce a boolean condition
       if (condition()){
         do_true(element);
       }else{
         do_false(element);
       }
    }
}

int main( int argc, char* argv[] ) {
    expensive_loop( foo, bar );

    return 0;
}

和参数传递的函数

// File B.cpp
#include <math.h>

int elements[50];

bool condition() {
    return elements[0] == 1;
}

inline int foo( int arg ) {
    return arg%3;
}

inline int bar( int arg ) {
    return 1234%arg;
}

以不同的翻译单位定义。

0000000000400620 <expensive_loop(void (*)(int), void (*)(int))>:
  400620:       41 55                   push   %r13
  400622:       49 89 fd                mov    %rdi,%r13
  400625:       41 54                   push   %r12
  400627:       49 89 f4                mov    %rsi,%r12
  40062a:       55                      push   %rbp
  40062b:       53                      push   %rbx
  40062c:       bb 60 10 60 00          mov    $0x601060,%ebx
  400631:       48 83 ec 08             sub    $0x8,%rsp
  400635:       eb 19                   jmp    400650 <expensive_loop(void (*)(int), void (*)(int))+0x30>
  400637:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40063e:       00 00
  400640:       48 83 c3 04             add    $0x4,%rbx
  400644:       41 ff d5                callq  *%r13
  400647:       48 81 fb 28 11 60 00    cmp    $0x601128,%rbx
  40064e:       74 1d                   je     40066d <expensive_loop(void (*)(int), void (*)(int))+0x4d>
  400650:       8b 2b                   mov    (%rbx),%ebp
  400652:       e8 79 ff ff ff          callq  4005d0 <condition()>
  400657:       84 c0                   test   %al,%al
  400659:       89 ef                   mov    %ebp,%edi
  40065b:       75 e3                   jne    400640 <expensive_loop(void (*)(int), void (*)(int))+0x20>
  40065d:       48 83 c3 04             add    $0x4,%rbx
  400661:       41 ff d4                callq  *%r12
  400664:       48 81 fb 28 11 60 00    cmp    $0x601128,%rbx
  40066b:       75 e3                   jne    400650 <expensive_loop(void (*)(int), void (*)(int))+0x30>
  40066d:       48 83 c4 08             add    $0x8,%rsp
  400671:       5b                      pop    %rbx
  400672:       5d                      pop    %rbp
  400673:       41 5c                   pop    %r12
  400675:       41 5d                   pop    %r13
  400677:       c3                      retq
  400678:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40067f:       00

即使使用-O3优化级别，您也可以查看调用的执行方式：

400644:       41 ff d5                callq  *%r13

内联函数是作为参数传递的，是否真的在C / C ++中内联执行？

2 个答案: