Question

我有一个3D数组，以列方式存储为一维数组。例如，

for( int k = 0; k < nk; k++ ) // Loop through the height.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
        for( int i = 0; i < ni; i++ ) // Loop through the columns.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray[ ijk ] = 1.0;
        }

对于我的应用程序，我需要访问my3Darray的所有行/列/高度。按高度，我指的是数组第三维中的向量。我需要这个，因为我想处理每个向量的FFT并返回结果向量。感谢朋友们在stackoverflow中的想法，我将如何有效地访问这些向量。当然，例如，高度向量的一个微不足道的可能性是：

for( int i = 0; i < ni; i++ ) // Loop through the columns.
    for( int j = 0; j < nj; j++ ) // Loop through the rows.
    {
        for( int k = 0; k < nk; k++ ) // Loop through the heights.
        {
            ijk = i + ni * j + ni * nj * k;
            myvec[ k ] = my3Darray[ ijk ];
            fft( myvec, myvec_processed );
        }

        // Store the results in a new array, which is storing myvec_processed in my3Darray_fft_values.
        for( int k = 0; k < nk; k++ ) // Loop through the heights.
        {
            ijk = i + ni * j + ni * nj * k;
            my3Darray_fft_values[ ijk ] = myvec_processed[ k ];
        }
    }

我有效地计算这个吗？是否有可能将my3Darray直接传递给处理向量FFT的函数（而不是将向量复制到myvec）？

Answer 1

你可以通过预先计算这样的步幅来减少乘法：

...
for( int j = 0; j < nj; j++ ) // Loop through the rows.
{
    int stride = ni * nj;
    ijk = i + ni * j;
    for( int k = 0; k < nk; k++ ) // Loop through the heights.
    {
        myvec[ k ] = my3Darray[ ijk ];
        fft( myvec, myvec_processed );
        ijk += stride;
    }
}

但这只会加速一点。由于以非顺序方式访问my3Darray，您仍会遇到缓存问题。

Answer 2

当一切都减少到最里面的位和字节时，你的三维数组当然会被存储在一维存储器中。因此，给定数组元素的三个维度，编译器会生成几乎相同的代码来计算数组元素的位置，就像您自己一样。惊喜！

所以，换句话说，它几乎是一回事。

使用显式三维数组，编译器唯一可能有用的是编译器知道所有内部维度的大小，如果最里面的维度切片的大小恰好是方便的话，比如如果是2的幂，编译器可能用等效的左移替换一些乘法，我想这会稍快一些，然后是一个完整的乘法指令。但如果结果表明性能差异很大，我会感到惊讶。

选择维度的相对顺序可能更为重要，因此对于转换，典型的访问模式将更加符合CPU缓存。

有效访问存储为一维阵列的3D阵列

2 个答案: