我一直在网上(和stackoverflow)搜索关于1维数组(或向量)是否比它们的2维对应物更快的意见。一般的结论似乎是1维是最快的。但是,我写了一个简短的测试程序来亲眼看看,它表明二维是最好的。任何人都可以在我的测试中发现错误,或者至少解释为什么我会得到这个结果?
我用它来存储矩阵,因此需要用行和列索引一维数组。
#include <iostream>
#include <chrono>
#include <vector>
uint64_t timestamp()
{
namespace sc = std::chrono;
static auto start = sc::high_resolution_clock::now();
return sc::duration_cast<sc::duration<uint64_t, std::micro>>(sc::high_resolution_clock::now() - start).count();
}
int main(int argc, char** argv)
{
if (argc < 3)
return 0;
size_t size = atoi(argv[1]);
size_t repeat = atoi(argv[2]);
int** d2 = (int**)malloc(size*sizeof(int*));
for (size_t i = 0; i < size; ++i)
d2[i] = (int*)malloc(size*sizeof(int));
int* d1 = (int*)malloc(size*size*sizeof(int));
std::vector<std::vector<int> > d2v(size);
for (auto& i : d2v)
i.resize(size);
std::vector<int> d1v(size*size);
uint64_t start, end;
timestamp();
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d2[r][c] = 0;
else
d2[r][c] = d2[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D array\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d2[r][c] = 0;
else
d2[r][c] = d2[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D array C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D array\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D array C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d2v[r][c] = 0;
else
d2v[r][c] = d2v[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D vector\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d2v[r][c] = 0;
else
d2v[r][c] = d2v[r-1][c] + 1;
}
}
}
end = timestamp();
std::cout << "2D vector C\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t r = 0; r < size; ++r)
{
for (size_t c = 0; c < size; ++c)
{
if (r == 0)
d1v[r + c*size] = 0;
else
d1v[r + c*size] = d1v[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D vector\t" << size << "\t" << end - start << std::endl;
start = timestamp();
for (size_t n = 0; n < repeat; ++n)
{
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1v[r + c*size] = 0;
else
d1v[r + c*size] = d1v[r-1 + c*size] + 1;
}
}
}
end = timestamp();
std::cout << "1D vector C\t" << size << "\t" << end - start << std::endl;
return 0;
}
我得到以下输出:
user@user-debian64:~/matrix$ ./build/test/index_test 1000 100
2D array 1000 79593
2D array C 1000 326695
1D array 1000 440695
1D array C 1000 262251
2D vector 1000 73648
2D vector C 1000 418287
1D vector 1000 371433
1D vector C 1000 269355
user@user-debian64:~/matrix$ ./build/test/index_test 10000 1
2D array 10000 149748
2D array C 10000 3507346
1D array 10000 2754570
1D array C 10000 257997
2D vector 10000 92041
2D vector C 10000 3791745
1D vector 10000 3384403
1D vector C 10000 266811
答案 0 :(得分:2)
你在迭代1D数组的方式是错误的。您不需要在一维数组中使用嵌套循环。它不仅没有必要,而且还带来额外的数学工作来计算指数。而不是这部分,
for (size_t c = 0; c < size; ++c)
{
for (size_t r = 0; r < size; ++r)
{
if (r == 0)
d1[r + c*size] = 0;
else
d1[r + c*size] = d1[r-1 + c*size] + 1;
}
}
你应该写
for (size_t r = 0; r < size*size; ++r)
{
if (r == 0)
d1[r] = 0;
else
d1[r] = d1[r-1] + 1;
}
,没关系。
答案 1 :(得分:2)
问题的根源在于您的存储顺序在两种方案之间是不同的。
您的2D结构存储为row-major。通过首先取消引用该行,您将到达一个可以按列直接索引的缓冲区。相邻列位于相邻的内存位置。
您的1D结构存储为列主要。相邻列在内存中是size
个元素。
尝试两种迭代顺序几乎涵盖了所有效果。但剩下的就是数据依赖性。通过引用D(r-1,c)
,行和列主要访问模式完全不同。
果然,将1D索引更改为d1[r*size + c]
和d1[(r-1)*size + c]
会产生以下时间:
2D array 1000 78099
2D array C 1000 878527
1D array 1000 19661
1D array C 1000 729280
2D vector 1000 61641
2D vector C 1000 741249
1D vector 1000 18348
1D vector C 1000 726231
所以,我们仍然需要解释它。我正在使用“循环依赖”。当您按列主要顺序迭代列主要1D数组时(好主意),每个元素都依赖于在上一次迭代中计算的元素。这意味着循环不能完全流水线化,因为结果必须完全计算并写回缓存,然后才能再次读取以计算下一个元素。在row-major中,依赖现在是很久以前计算的元素,这意味着循环可以展开并流水线化。