Question

请考虑以下代码：

#include <algorithm>
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>

int main() {
    std::vector<int> v(12);
    std::iota(v.begin(), v.end(), 0);

    //std::next_permutation(v.begin(), v.end());

    using clock = std::chrono::high_resolution_clock;
    clock c;
    auto start = c.now();

    unsigned long counter = 0;
    do {
        ++counter;
    } while (std::next_permutation(v.begin(), v.end()));

    auto end = c.now();
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);    
    std::cout << counter << " permutations took " << duration.count() / 1000.0f << " s";
}

在我的AMD 4.1 GHz CPU上使用GCC（MinGW）5.3 -O2进行编译，这需要2.3 s。但是，如果我在未注释的行中发表评论，它会减慢到3.4 s。我期望最小的加速，因为我们测量一个排列更少的时间。对于-O3，差异不是极端2.0 s到2.4 s。

任何人都可以解释一下吗？超级智能编译器是否可以检测到我想要遍历所有排列并优化此代码？

Answer 1

我认为编译器会因你在代码中的两个单独的行中调用函数而感到困惑，导致它不是内联的。

GCC 8.0.0也与您的一样。

Benefits of inline functions in C++?它为编译器提供了一种应用更多优化的简单机制，因此在某些情况下，丢失内联标识可能会导致性能严重下降。

从第二次排列开始时极度减速

1 个答案: