使用二维还是一维,哪个最快?

时间:2014-11-17 14:02:57

标签: c++ optimization

我一直在网上(和stackoverflow)搜索关于1维数组(或向量)是否比它们的2维对应物更快的意见。一般的结论似乎是1维是最快的。但是,我写了一个简短的测试程序来亲眼看看,它表明二维是最好的。任何人都可以在我的测试中发现错误,或者至少解释为什么我会得到这个结果?

我用它来存储矩阵,因此需要用行和列索引一维数组。

#include <iostream>
#include <chrono>
#include <vector>

uint64_t timestamp()
{
    namespace sc = std::chrono;
    static auto start = sc::high_resolution_clock::now();
    return sc::duration_cast<sc::duration<uint64_t, std::micro>>(sc::high_resolution_clock::now() - start).count();
}

int main(int argc, char** argv)
{
    if (argc < 3)
        return 0;
    size_t size = atoi(argv[1]);
    size_t repeat = atoi(argv[2]);

    int** d2 = (int**)malloc(size*sizeof(int*));
    for (size_t i = 0; i < size; ++i)
        d2[i] = (int*)malloc(size*sizeof(int));

    int* d1 = (int*)malloc(size*size*sizeof(int));

    std::vector<std::vector<int> > d2v(size);
    for (auto& i : d2v)
        i.resize(size);

    std::vector<int> d1v(size*size);

    uint64_t start, end;
    timestamp();

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t r = 0; r < size; ++r)
        {
            for (size_t c = 0; c < size; ++c)
            {
                if (r == 0)
                    d2[r][c] = 0;
                else
                    d2[r][c] = d2[r-1][c] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "2D array\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t c = 0; c < size; ++c)
        {
            for (size_t r = 0; r < size; ++r)
            {
                if (r == 0)
                    d2[r][c] = 0;
                else
                    d2[r][c] = d2[r-1][c] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "2D array C\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t r = 0; r < size; ++r)
        {
            for (size_t c = 0; c < size; ++c)
            {
                if (r == 0)
                    d1[r + c*size] = 0;
                else
                    d1[r + c*size] = d1[r-1 + c*size] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "1D array\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t c = 0; c < size; ++c)
        {
            for (size_t r = 0; r < size; ++r)
            {
                if (r == 0)
                    d1[r + c*size] = 0;
                else
                    d1[r + c*size] = d1[r-1 + c*size] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "1D array C\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t r = 0; r < size; ++r)
        {
            for (size_t c = 0; c < size; ++c)
            {
                if (r == 0)
                    d2v[r][c] = 0;
                else
                    d2v[r][c] = d2v[r-1][c] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "2D vector\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t c = 0; c < size; ++c)
        {
            for (size_t r = 0; r < size; ++r)
            {
                if (r == 0)
                    d2v[r][c] = 0;
                else
                    d2v[r][c] = d2v[r-1][c] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "2D vector C\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t r = 0; r < size; ++r)
        {
            for (size_t c = 0; c < size; ++c)
            {
                if (r == 0)
                    d1v[r + c*size] = 0;
                else
                    d1v[r + c*size] = d1v[r-1 + c*size] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "1D vector\t" << size << "\t" << end - start << std::endl;

    start = timestamp();
    for (size_t n = 0; n < repeat; ++n)
    {
        for (size_t c = 0; c < size; ++c)
        {
            for (size_t r = 0; r < size; ++r)
            {
                if (r == 0)
                    d1v[r + c*size] = 0;
                else
                    d1v[r + c*size] = d1v[r-1 + c*size] + 1;
            }
        }
    }
    end = timestamp();
    std::cout << "1D vector C\t" << size << "\t" << end - start << std::endl;

    return 0;
}

我得到以下输出:

user@user-debian64:~/matrix$ ./build/test/index_test 1000 100
2D array    1000    79593
2D array C  1000    326695
1D array    1000    440695
1D array C  1000    262251
2D vector   1000    73648
2D vector C 1000    418287
1D vector   1000    371433
1D vector C 1000    269355
user@user-debian64:~/matrix$ ./build/test/index_test 10000 1
2D array    10000   149748
2D array C  10000   3507346
1D array    10000   2754570
1D array C  10000   257997
2D vector   10000   92041
2D vector C 10000   3791745
1D vector   10000   3384403
1D vector C 10000   266811

2 个答案:

答案 0 :(得分:2)

你在迭代1D数组的方式是错误的。您不需要在一维数组中使用嵌套循环。它不仅没有必要,而且还带来额外的数学工作来计算指数。而不是这部分,

for (size_t c = 0; c < size; ++c)
{
    for (size_t r = 0; r < size; ++r)
    {
        if (r == 0)
            d1[r + c*size] = 0;
        else
            d1[r + c*size] = d1[r-1 + c*size] + 1;
    }
}

你应该写

for (size_t r = 0; r < size*size; ++r)
{
    if (r == 0)
        d1[r] = 0;
    else
        d1[r] = d1[r-1] + 1;
}

,没关系。

答案 1 :(得分:2)

问题的根源在于您的存储顺序在两种方案之间是不同的。

您的2D结构存储为row-major。通过首先取消引用该行,您将到达一个可以按列直接索引的缓冲区。相邻列位于相邻的内存位置。

您的1D结构存储为列主要。相邻列在内存中是size个元素。

尝试两种迭代顺序几乎涵盖了所有效果。但剩下的就是数据依赖性。通过引用D(r-1,c),行和列主要访问模式完全不同。

果然,将1D索引更改为d1[r*size + c]d1[(r-1)*size + c]会产生以下时间:

2D array    1000    78099
2D array C  1000    878527
1D array    1000    19661
1D array C  1000    729280
2D vector   1000    61641
2D vector C 1000    741249
1D vector   1000    18348
1D vector C 1000    726231

所以,我们仍然需要解释它。我正在使用“循环依赖”。当您按列主要顺序迭代列主要1D数组时(好主意),每个元素都依赖于在上一次迭代中计算的元素。这意味着循环不能完全流水线化,因为结果必须完全计算并写回缓存,然后才能再次读取以计算下一个元素。在row-major中,依赖现在是很久以前计算的元素,这意味着循环可以展开并流水线化。