我不确定它是否是更多数学或更多编程问题。如果是数学,请告诉我。
我知道有很多可以免费使用的FFT项目。但我试着理解FFT方法。只是为了好玩和学习它。所以我做了两种算法--DFT和FFT,来比较它们。
但我的FFT有问题。似乎效率没有太大差异。我的FFT只比DFT快一点(在某些情况下,它的速度提高了两倍,但是它的最大加速度)
在大多数关于FFT的文章中,有一些关于比特反转的内容。但是我没有看到使用位反转的原因。可能是这样的。我不明白。请帮我。我做错了什么?
这是我的代码(您可以在此处复制并查看其工作原理 - online compiler):
#include <complex>
#include <iostream>
#include <math.h>
#include <cmath>
#include <vector>
#include <chrono>
#include <ctime>
float _Pi = 3.14159265;
float sampleRate = 44100;
float resolution = 4;
float _SRrange = sampleRate / resolution; // I devide Sample Rate to make the loop smaller,
//just to perform tests faster
float bufferSize = 512;
// Clock class is for measure time to execute whole loop:
class Clock
{
public:
Clock() { start = std::chrono::high_resolution_clock::now(); }
~Clock() {}
float secondsElapsed()
{
auto stop = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::microseconds>(stop - start).count();
}
void reset() { start = std::chrono::high_resolution_clock::now(); }
private:
std::chrono::time_point<std::chrono::high_resolution_clock> start;
};
// Function to calculate magnitude of complex number:
float _mag_Hf(std::complex<float> sf);
// Function to calculate exp(-j*2*PI*n*k / sampleRate) - where "j" is imaginary number:
std::complex<float> _Wnk_Nc(float n, float k);
// Function to calculate exp(-j*2*PI*k / sampleRate):
std::complex<float> _Wk_Nc(float k);
int main() {
float scaleFFT = 512; // devide and conquere - if it's "1" then whole algorhitm is just simply DFT
// I wonder what is the maximum of that value. I alvays thought it should be equal to
// buffer size (number o samples) but above some value it start to work slower then DFT
std::vector<float> inputSignal; // array of input signal
inputSignal.resize(bufferSize); // how many sample we will use to calculate Fourier Transform
std::vector<std::complex<float>> _Sf; // array to store Fourier Transform value for each measured frequency bin
_Sf.resize(scaleFFT); // resize it to size which we need.
std::vector<std::complex<float>> _Hf_Db_vect; //array to store magnitude (in logarythmic dB scale)
//for each measured frequency bin
_Hf_Db_vect.resize(_SRrange); //resize it to make it able to store value for each measured freq value
std::complex<float> _Sf_I_half; // complex to calculate first half of freq range
// from 1 to Nyquist (sampleRate/2)
std::complex<float> _Sf_II_half; // complex to calculate second half of freq range
//from Nyquist to sampleRate
for(int i=0; i<(int)_Sf.size(); i++)
inputSignal[i] = cosf((float)i/_Pi); // fill the input signal with some data, no matter
Clock _time; // Start measure time
for(int freqBinK=0; freqBinK < _SRrange/2; freqBinK++) // start calculate all freq (devide by 2 for two halves)
{
for(int i=0; i<(int)_Sf.size(); i++) _Sf[i] = 0.0f; // clean all values, for next loop we need all values to be zero
for (int n=0; n<bufferSize/_Sf.size(); ++n) // Here I take all samples in buffer
{
std::complex<float> _W = _Wnk_Nc(_Sf.size()*(float)n, freqBinK);
for(int i=0; i<(int)_Sf.size(); i++) // Finally here is my devide and conquer
_Sf[i] += inputSignal[_Sf.size()*n +i] * _W; // And I see no reason to use any bit reversal, how it shoul be????
}
std::complex<float> _Wk = _Wk_Nc(freqBinK);
_Sf_I_half = 0.0f;
_Sf_II_half = 0.0f;
for(int z=0; z<(int)_Sf.size()/2; z++) // here I calculate Fourier transform for each freq
{
_Sf_I_half += _Wk_Nc(2.0f * (float)z * freqBinK) * (_Sf[2*z] + _Wk * _Sf[2*z+1]); // First half - to Nyquist
_Sf_II_half += _Wk_Nc(2.0f * (float)z *freqBinK) * (_Sf[2*z] - _Wk * _Sf[2*z+1]); // Second half - to SampleRate
// also don't see need to use reversal bit, where it shoul be??? :)
}
// Calculate magnitude in dB scale
_Hf_Db_vect[freqBinK] = _mag_Hf(_Sf_I_half); // First half
_Hf_Db_vect[freqBinK + _SRrange/2] = _mag_Hf(_Sf_II_half); // Second half
}
std::cout << _time.secondsElapsed() << std::endl; // time measuer after execution of whole loop
}
float _mag_Hf(std::complex<float> sf)
{
float _Re_2;
float _Im_2;
_Re_2 = sf.real() * sf.real();
_Im_2 = sf.imag() * sf.imag();
return 20*log10(pow(_Re_2 + _Im_2, 0.5f)); //transform magnitude to logarhytmic dB scale
}
std::complex<float> _Wnk_Nc(float n, float k)
{
std::complex<float> _Wnk_Ncomp;
_Wnk_Ncomp.real(cosf(-2.0f * _Pi * (float)n * k / sampleRate));
_Wnk_Ncomp.imag(sinf(-2.0f * _Pi * (float)n * k / sampleRate));
return _Wnk_Ncomp;
}
std::complex<float> _Wk_Nc(float k)
{
std::complex<float> _Wk_Ncomp;
_Wk_Ncomp.real(cosf(-2.0f * _Pi * k / sampleRate));
_Wk_Ncomp.imag(sinf(-2.0f * _Pi * k / sampleRate));
return _Wk_Ncomp;
}
答案 0 :(得分:1)
您正在犯的一个重大错误是动态计算蝴蝶重量(包括sin
和cos
)(在_Wnk_Nc()
中)。 sin
和cos
通常需要10到100个时钟周期,而其他蝶形运算只需要mul和add,这只需要几个周期,因此需要考虑这些因素。所有快速FFT实现都是初始化步骤的一部分(通常称为&#34;计划创建&#34;或类似)。参见例如FFTW和KissFFT。
答案 1 :(得分:1)
除了上述&#34;预先计算蝴蝶重量&#34;优化,大多数FFT实现也使用SIMD
指令来矢量化代码。
//还没有看到需要使用反转位,它应该在哪里?
第一个蝶形循环应该是反向位索引。这些索引通常在递归内计算,但对于循环解决方案,计算这些索引的成本也很高,因此在计划中预先计算它们也会更好。
结合这些优化方法可以实现大约100倍的加速
答案 2 :(得分:0)
大多数快速FFT实现要么使用预先计算的旋转因子的查找表,要么使用简单的递归来动态旋转旋转因子,而不是在FFT内循环内调用三角数学库函数。
对于大型FFT,使用触发递归公式不太可能破坏当代处理器上的数据缓存。