Question

我创建了一个小的性能测试，比较了三种流行的动态分配技术的设置和访问时间：原始指针，std :: unique_ptr和std :: deque。

编辑：根据@ NathanOliver＆s，添加std::vector：编辑2：每个后开发者，分配有std :: vector（n）和std :: deque（n）构造函数编辑3：按@BaummitAugen，在定时循环内移动分配，并编译优化版本。编辑4：根据@PaulMcKenzie的评论，设置为2000。

结果：这些变化使得事情变得更加紧张。 Deque和Vector在分配和分配时仍然较慢，而deque在访问时要慢得多：

pickledEgg $ g ++ -std = c ++ 11 -o sp2 -O2 sp2.cpp

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000085643    0.0000000724
Smart:  0.0000085281    0.0000000732
Deque:  0.0000205775    0.0000076908
Vector: 0.0000163492    0.0000000760

只是为了好玩，这里有 - 最快结果：
pickledEgg $ g ++ -std = c ++ 11 -o sp2 -Ofast sp2.cpp

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000045316    0.0000000893
Smart:  0.0000038308    0.0000000730
Deque:  0.0000165620    0.0000076475
Vector: 0.0000063442    0.0000000699

原文：后人;注意缺少优化器-O2标志：

pickledEgg $ g ++ -std = c ++ 11 -o sp2 sp2.cpp

Average of 100 runs:
Method  Assign      Access
======  ======      ======
Raw:    0.0000466522    0.0000468586
Smart:  0.0004391623    0.0004406758
Deque:  0.0003144142    0.0021758729
Vector: 0.0004715145    0.0003829193

更新代码：

#include <iostream>
#include <iomanip>
#include <vector>
#include <deque>
#include <chrono>
#include <memory>

const int NUM_RUNS(2000);

int main() {
    std::chrono::high_resolution_clock::time_point b, e;
    std::chrono::duration<double> t, raw_assign(0), raw_access(0), smart_assign(0), smart_access(0), deque_assign(0), deque_access(0), vector_assign(0), vector_access(0);
    int k, tmp, n(32768);

    std::cout << "Average of " << NUM_RUNS << " runs:" << std::endl; 
    std::cout << "Method " << '\t' << "Assign" << "\t\t" << "Access" << std::endl;
    std::cout << "====== " << '\t' << "======" << "\t\t" << "======" << std::endl;

    // Raw
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        int* raw_p = new int[n]; // run-time allocation
        for (int i=0; i<n; ++i) { //assign
            raw_p[i] = i;
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        raw_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = raw_p[i];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        raw_access+=t;
        delete [] raw_p; // :^)
    }
    raw_assign /= NUM_RUNS;
    raw_access /= NUM_RUNS;
    std::cout << "Raw:   " << '\t' << std::setprecision(10) << std::fixed << raw_assign.count() << '\t' << raw_access.count() << std::endl;

    // Smart
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::unique_ptr<int []> smart_p(new int[n]); // run-time allocation
        for (int i=0; i<n; ++i) { //assign
            smart_p[i] = i;
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        smart_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = smart_p[i];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        smart_access+=t;
    }
    smart_assign /= NUM_RUNS;
    smart_access /= NUM_RUNS;
    std::cout << "Smart: " << '\t' << std::setprecision(10) << std::fixed << smart_assign.count() << '\t' << smart_access.count() << std::endl;

    // Deque
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::deque<int> myDeque(n);
        for (int i=0; i<n; ++i) { //assign
            myDeque[n] = i;
//          myDeque.push_back(i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        deque_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = myDeque[n];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        deque_access+=t;
    }
    deque_assign /= NUM_RUNS;
    deque_access /= NUM_RUNS;
    std::cout << "Deque: " << '\t' << std::setprecision(10) << std::fixed << deque_assign.count() << '\t' << deque_access.count() << std::endl;

    // vector
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::vector<int> myVector(n);
        for (int i=0; i<n; ++i) { //assign
            myVector[i] = i;
//          .push_back(i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        vector_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = myVector[i];
//          tmp = *(myVector.begin() + i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        vector_access+=t;
    }
    vector_assign /= NUM_RUNS;
    vector_access /= NUM_RUNS;
    std::cout << "Vector:" << '\t' << std::setprecision(10) << std::fixed << vector_assign.count() << '\t' << vector_access.count() << std::endl;

    std::cout << std::endl;
    return 0;
}

Answer 1

从结果中可以看出，原始指针在两个类别中都是明显的赢家。这是为什么？

因为......

g++ -std=c++11 -o sp2 sp2.cpp

...您没有启用优化。调用为std::vector或std::unique_ptr等非基本类型重载的运算符涉及函数调用。使用基本类型的运算符（如原始指针）不涉及函数调用。

函数调用通常比没有函数调用慢。在几次迭代中，函数调用的小开销相乘。但是，优化器可以内联函数调用，从而使非基本类型的缺点变得无效。但只有在执行优化时才会这样做。

std::deque还有一个慢的原因：访问双端队列的任意元素的算法比访问数组更复杂。虽然std::deque具有良好的随机访问性能，但它并不像数组那样好。 std::deque更合适的用例是线性迭代（使用迭代器）。

此外，您使用std::deque::at进行边界检查。下标运算符不进行边界检查。边界检查会增加运行时开销。

原始数组在分配速度超过std::vector时显得略微偏差，可能是因为std::vector对数据进行零初始化。

Answer 2

std::deque是双重链接列表。 myDeque.at(i)必须在每次通话时遍历前i个元素。这就是为什么访问双端队列的速度非常慢。

std::vector的初始化很慢，因为您没有预先分配足够的内存。 std::vector然后从少量元素开始，一旦尝试插入更多元素，通常会加倍。这种重新分配涉及为所有元素调用移动构造函数。尝试构建这样的矢量：

std::vector<int> myVector{n};

在向量访问中我想知道为什么你没有使用tmp = myVector[i]。您不是调用索引运算符，而是实例化迭代器，调用其+运算符，并在结果上调用解引用运算符。由于您没有优化，函数调用可能不会内联，因此，为什么std :: vector访问比原始指针慢。

对于我认为的std::uniqe_ptr，它与std::vector的原因相似。您总是在唯一指针上调用索引运算符，这也是一个函数调用。就像一个实验一样，你可以在为smart_p分配内存后立即尝试，调用smart_p.get()并使用原始指针进行其余操作。我假设，它将与原始指针一样快。这可以证明我的假设，即函数调用。然后简单的建议是，启用优化并再试一次。

kmiklas edit ：

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000086415    0.0000000681
Smart:  0.0000081824    0.0000000670
Deque:  0.0000204542    0.0000076554
Vector: 0.0000164252    0.0000000678

动态分配和随机访问：原始，智能，Deque，矢量。为什么原始速度如此之快，而且deque这么慢？

2 个答案: