Question

在我正在开发的数值模拟中，我必须执行许多2D离散傅立叶变换，我使用FFTW进行数组的元素乘法运算。

我使用以下容器作为我的数据：

std::vector<std::complex<float>, fftwAllocator<std::complex<float>>> data(LX*LY);

LX和LY不一定相等。 fftwAllocator是一个自定义分配器，使用fftw_malloc()进行内存对齐。

目前，我的元素乘法看起来像这样：

Wave &operator*=(const Wave &m) {
    for(unsigned int i = 0; i < LX * LY; i++)
        _data[i] *= m._data[i];

    return *this;
}

我知道，编译器可能会做很多魔术，但鉴于我的数组已经以fftw_malloc()的SIMD兼容方式对齐，我想我可以在这里使用向量指令来加快速度甚至更多。

这里有一种简单的方法可以引入与平台无关的矢量指令吗？我真的很惊讶，FFTW中没有以某种方式包含简单的向量乘法，因为很多人用它来卷积信号......

Answer 1

正如Peter Cordes在我的问题评论中所建议的那样，gcc能够对某些指令本身进行矢量化，可以通过编译标记-fopt-info-vec-all进行检查。

然而，事实证明，complex& operator*=(const T& other);无法进行矢量化，因此我不得不用以下内容替换我的问题中的函数：

Wave &operator*=(const Wave &m) {
    // the builtin product of std::complex is not
    // vectorized by gcc, so we're doing it manually
    // here.
    float tmp;
    for(unsigned int i = 0; i < _lx * _ly; i++) {
        tmp = _data[i].real();
        _data[i].real(_data[i].real() * m._data[i].real() - _data[i].imag() * m._data[i].imag());
        _data[i].imag(tmp * m._data[i].imag() + _data[i].imag() * m._data[i].real());
    }

    return *this;
}

通过这种方式，gcc -O3成功地对循环进行了矢量化。

优化两个std :: vector <std :: complex <float>＆gt;的元素乘积。用fftw_malloc（）分配

1 个答案: