Question

在包括单个元素在内的很多情况下，我只是通过使用std::tuple来使我的代码更加通用化。我的意思是例如tuple<double>而不是double。但我决定检查这个特例的表现。

这是简单的性能基准测试：

#include <tuple>
#include <iostream>

using std::cout;
using std::endl;
using std::get;
using std::tuple;

int main(void)
{

#ifdef TUPLE
    using double_t = std::tuple<double>;
#else
    using double_t = double;
#endif

    constexpr int count = 1e9;
    auto array = new double_t[count];

    long long sum = 0;
    for (int idx = 0; idx < count; ++idx) {
#ifdef TUPLE
        sum += get<0>(array[idx]);
#else
        sum += array[idx];
#endif
    }
    delete[] array;
    cout << sum << endl; // just "external" side effect for variable sum.
}

并运行结果：

$ g++ -DTUPLE -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m3.347s
user    0m2.839s
sys     0m0.485s

$ g++  -O2 -std=c++11 test.cpp && time ./a.out
0  

real    0m2.963s
user    0m2.424s
sys     0m0.519s

我认为元组是严格的静态编译模板，并且所有get＆lt;＆gt;在这种情况下，函数只是通常的变量访问。此测试中的BTW内存分配大小相同。为什么会出现执行时间差异？

编辑：问题在于元组的初始化＆lt;＆gt;宾语。为了使测试更准确，必须更改一行：

     constexpr int count = 1e9;
-    auto array = new double_t[count];
+    auto array = new double_t[count]();

     long long sum = 0;

之后可以观察到类似的结果：

$ g++ -DTUPLE -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.342s
real    0m3.339s
real    0m3.343s

$ g++ -g -O2 -std=c++11 test.cpp && (for i in $(seq 3); do time ./a.out; done) 2>&1 | grep real
real    0m3.349s
real    0m3.339s
real    0m3.334s

Answer 1

元组所有默认构造值（所以一切都是0）双精度不会默认初始化。

在生成的程序集中，以下初始化循环仅在使用元组时出现。否则它们是等价的。

.L2:
    movq    $0, (%rdx)
    addq    $8, %rdx
    cmpq    %rcx, %rdx
    jne .L2

C ++ 11元组性能

1 个答案: