FFT的研究 - 为什么它不快?

时间:2018-03-12 13:50:28

标签: c++ signal-processing fft dft

我不确定它是否是更多数学或更多编程问题。如果是数学,请告诉我。

我知道有很多可以免费使用的FFT项目。但我试着理解FFT方法。只是为了好玩和学习它。所以我做了两种算法--DFT和FFT,来比较它们。

但我的FFT有问题。似乎效率没有太大差异。我的FFT只比DFT快一点(在某些情况下,它的速度提高了两倍,但是它的最大加速度)

在大多数关于FFT的文章中,有一些关于比特反转的内容。但是我没有看到使用位反转的原因。可能是这样的。我不明白。请帮我。我做错了什么?

这是我的代码(您可以在此处复制并查看其工作原理 - online compiler):

#include <complex>
#include <iostream>
#include <math.h>
#include <cmath>
#include <vector>
#include <chrono>
#include <ctime>

float _Pi = 3.14159265;
float sampleRate = 44100;
float resolution = 4;
float _SRrange = sampleRate / resolution; // I devide Sample Rate to make the loop smaller,
                                          //just to perform tests faster
float bufferSize = 512;

// Clock class is for measure time to execute whole loop:
class Clock
{
    public:
        Clock() { start = std::chrono::high_resolution_clock::now(); }
        ~Clock() {}

        float secondsElapsed()
        {
            auto stop = std::chrono::high_resolution_clock::now();
            return std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();
        }
        void reset() { start = std::chrono::high_resolution_clock::now(); }

    private: 
        std::chrono::time_point<std::chrono::high_resolution_clock> start;
};


// Function to calculate magnitude of complex number:
float _mag_Hf(std::complex<float> sf);

// Function to calculate exp(-j*2*PI*n*k / sampleRate) - where "j" is imaginary number:
std::complex<float> _Wnk_Nc(float n, float k);

// Function to calculate exp(-j*2*PI*k / sampleRate):
std::complex<float> _Wk_Nc(float k);



int main() {
  float scaleFFT = 512; // devide and conquere - if it's "1" then whole algorhitm is just simply DFT
            // I wonder what is the maximum of that value. I alvays thought it should be equal to
            // buffer size (number o samples) but above some value it start to work slower then DFT

  std::vector<float> inputSignal; // array of input signal
  inputSignal.resize(bufferSize); // how many sample we will use to calculate Fourier Transform

  std::vector<std::complex<float>> _Sf; // array to store Fourier Transform value for each measured frequency bin
  _Sf.resize(scaleFFT); // resize it to size which we need.

  std::vector<std::complex<float>> _Hf_Db_vect; //array to store magnitude (in logarythmic dB scale)            
                                                //for each measured frequency bin
  _Hf_Db_vect.resize(_SRrange); //resize it to make it able to store value for each measured freq value

  std::complex<float> _Sf_I_half; // complex to calculate first half of freq range
                                  // from 1 to Nyquist  (sampleRate/2)

  std::complex<float> _Sf_II_half; // complex to calculate second half of freq range
                                   //from Nyquist to sampleRate



        for(int i=0; i<(int)_Sf.size(); i++)
            inputSignal[i]  = cosf((float)i/_Pi); // fill the input signal with some data, no matter


  Clock _time; // Start measure time

for(int freqBinK=0; freqBinK < _SRrange/2; freqBinK++) // start calculate all freq (devide by 2 for two halves)
    {
        for(int i=0; i<(int)_Sf.size(); i++) _Sf[i]  = 0.0f; // clean all values, for next loop we need all values to be zero

        for (int n=0; n<bufferSize/_Sf.size(); ++n) // Here I take all samples in buffer
        {
            std::complex<float> _W = _Wnk_Nc(_Sf.size()*(float)n, freqBinK);

            for(int i=0; i<(int)_Sf.size(); i++) // Finally here is my devide and conquer
                _Sf[i]  += inputSignal[_Sf.size()*n  +i] * _W; // And I see no reason to use any bit reversal, how it shoul be????
        }

        std::complex<float> _Wk = _Wk_Nc(freqBinK);

        _Sf_I_half = 0.0f;
        _Sf_II_half = 0.0f;

        for(int z=0; z<(int)_Sf.size()/2; z++) // here I calculate Fourier transform for each freq
        {
            _Sf_I_half += _Wk_Nc(2.0f * (float)z * freqBinK) * (_Sf[2*z]  + _Wk * _Sf[2*z+1]); // First half - to Nyquist
            _Sf_II_half += _Wk_Nc(2.0f * (float)z *freqBinK) * (_Sf[2*z]  - _Wk * _Sf[2*z+1]); // Second half - to SampleRate
            // also don't see need to use reversal bit, where it shoul be??? :)
        }

        // Calculate magnitude in dB scale
        _Hf_Db_vect[freqBinK] = _mag_Hf(_Sf_I_half); // First half
        _Hf_Db_vect[freqBinK + _SRrange/2] = _mag_Hf(_Sf_II_half); // Second half
    }
  std::cout << _time.secondsElapsed() << std::endl; // time measuer after execution of whole loop
}


float _mag_Hf(std::complex<float> sf)
{
float _Re_2;
float _Im_2;
    _Re_2 = sf.real() * sf.real();
    _Im_2 = sf.imag() * sf.imag();
    return 20*log10(pow(_Re_2 + _Im_2, 0.5f)); //transform magnitude to logarhytmic dB scale
}

std::complex<float> _Wnk_Nc(float n, float k)
{
    std::complex<float> _Wnk_Ncomp;
    _Wnk_Ncomp.real(cosf(-2.0f * _Pi * (float)n * k / sampleRate));
    _Wnk_Ncomp.imag(sinf(-2.0f * _Pi * (float)n * k / sampleRate));
    return _Wnk_Ncomp;
}

std::complex<float> _Wk_Nc(float k)
{
    std::complex<float> _Wk_Ncomp;
    _Wk_Ncomp.real(cosf(-2.0f * _Pi * k / sampleRate));
    _Wk_Ncomp.imag(sinf(-2.0f * _Pi * k / sampleRate));
    return _Wk_Ncomp;
}

3 个答案:

答案 0 :(得分:1)

您正在犯的一个重大错误是动态计算蝴蝶重量(包括sincos)(在_Wnk_Nc()中)。 sincos通常需要10到100个时钟周期,而其他蝶形运算只需要mul和add,这只需要几个周期,因此需要考虑这些因素。所有快速FFT实现都是初始化步骤的一部分(通常称为&#34;计划创建&#34;或类似)。参见例如FFTWKissFFT

答案 1 :(得分:1)

除了上述&#34;预先计算蝴蝶重量&#34;优化,大多数FFT实现也使用SIMD指令来矢量化代码。

  

//还没有看到需要使用反转位,它应该在哪里?

第一个蝶形循环应该是反向位索引。这些索引通常在递归内计算,但对于循环解决方案,计算这些索引的成本也很高,因此在计划中预先计算它们也会更好。

结合这些优化方法可以实现大约100倍的加速

答案 2 :(得分:0)

大多数快速FFT实现要么使用预先计算的旋转因子的查找表,要么使用简单的递归来动态旋转旋转因子,而不是在FFT内循环内调用三角数学库函数。

对于大型FFT,使用触发递归公式不太可能破坏当代处理器上的数据缓存。