Question

我编写了一个简单的程序来测试STL列表性能与简单的类似C列表的数据结构。它在“push_back（）”行显示不良性能。有什么评论吗？

$ ./test2
 Build the type list : time consumed -> 0.311465
 Iterate over all items: time consumed -> 0.00898
 Build the simple C List: time consumed -> 0.020275
 Iterate over all items: time consumed -> 0.008755

源代码是：

#include <stdexcept>
#include "high_resolution_timer.hpp"

#include <list>
#include <algorithm>
#include <iostream>


#define TESTNUM 1000000

/* The test struct */
struct MyType {
    int num;
};


/*
 * C++ STL::list Test
 */
typedef struct MyType* mytype_t;

void myfunction(MyType t) {
}

int test_stl_list()
{
    std::list<mytype_t> mylist;
    util::high_resolution_timer t;

    /*
     * Build the type list
     */
    t.restart();
    for(int i = 0; i < TESTNUM; i++) {
        mytype_t aItem;
        aItem->num = i;
        mylist.push_back(aItem);
    }
    std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl;


    /*
     * Iterate over all item
     */
    t.restart();
    std::for_each(mylist.begin(), mylist.end(), myfunction);
    std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl;

    return 0;
}

/*
 * a simple C list
 */
struct MyCList;
struct MyCList{
    struct MyType m;
    struct MyCList* p_next;
};

int test_simple_c_list()
{
    struct MyCList* p_list_head = NULL;
    util::high_resolution_timer t;

    /*
     * Build it
     */
    t.restart();
    struct MyCList* p_new_item = NULL;
    for(int i = 0; i < TESTNUM; i++) {
        p_new_item = (struct MyCList*) malloc(sizeof(struct MyCList));
        if(p_new_item == NULL) {
            printf("ERROR : while malloc\n");
            return -1;
        }
        p_new_item->m.num = i;
        p_new_item->p_next = p_list_head;
        p_list_head = p_new_item;
    }
    std::cout << " Build the simple C List: time consumed -> " << t.elapsed() << std::endl;

    /*
     * Iterate all items
     */
    t.restart();
    p_new_item = p_list_head;
    while(p_new_item->p_next != NULL) {
        p_new_item = p_new_item->p_next;
    }
    std::cout << " Iterate over all items: time consumed -> " << t.elapsed() << std::endl;


    return 0;
}

int main(int argc, char** argv)
{
    if(test_stl_list() != 0) {
        printf("ERROR: error at testcase1\n");
        return -1;
    }

    if(test_simple_c_list() != 0) {
        printf("ERROR: error at testcase2\n");
        return -1;
    }

    return 0;
}

糟糕，是的。我修改了代码，它显示：

$ ./test2
 Build the type list : time consumed -> 0.163724
 Iterate over all items: time consumed -> 0.005427
 Build the simple C List: time consumed -> 0.018797
 Iterate over all items: time consumed -> 0.004778

所以，我的问题是，为什么我的“push_back”代码性能不佳？

Answer 1

有一件事是，在C中，你有一个链接的对象列表，但在C ++中，你有一个链接的指针列表（所以一方面，你做的是两倍的分配）。要比较苹果和苹果，您的STL代码应为：

int test_stl_list()
{
    std::list<MyType> mylist;
    util::high_resolution_timer t;

    /*
     * Build the type list
     */
    t.restart();
    for(int i = 0; i < TESTNUM; i++) {
        MyItem aItem;
        aItem.num = i;
        mylist.push_back(aItem);
    }
    std::cout << " Build the type list : time consumed -> " << t.elapsed() << std::endl;


    return 0;
}

Answer 2

首先，看起来你正在做push_front，而不是push_back（在你自己的实现中）。

其次，您还应该比较std::slist进行公平比较，因为std::list是双重关联的。

第三，您需要使用正确的编译器标志进行公平比较。使用gcc，您至少应该使用-O2进行编译。如果没有优化，STL总是很糟糕，因为没有完成内联并且有很多函数调用开销。

Answer 3

看起来你的high_resolution_timer课程不只是衡量你想要测量的例程。我会重构代码，以便t.restart()和t.elapsed()之间的仅代码是您热衷于测量的代码。其中的所有其他代码可能具有未知的性能影响，可能会扭曲您的结果。

Answer 4

您的STL代码为每个单元格创建两次内存块。以下内容来自x86_64

上的STL 4.1.1

void push_back(const value_type& __x)
{
   this->_M_insert(end(), __x);
}


// Inserts new element at position given and with value given.
void _M_insert(iterator __position, const value_type& __x)
{
   _Node* __tmp = _M_create_node(__x);     // Allocate a new space ####
   __tmp->hook(__position._M_node);
}

正如您所看到的，push_back（）函数在返回调用者之前调用了多个函数，并且每次调用其中一个函数时都会发生很少的指针值复制。可能是有条件的，因为所有参数都是通过const-reference传递的。

STL列表的push_back性能不好？

4 个答案: