CUFFT和FFTW在2D批处理模式下会给出不同的结果吗?

时间:2013-06-02 04:34:39

标签: cuda gpu fft fftw cufft

当一次进行一次FFT时,我发现FFTW和CUFFT给出了可比较的数值结果。但是,当我使用批处理模式执行多个FFT时,我的FFTW和CUFFT结果看起来没什么相似之处。

我们举一个简单的例子......

设置

int howMany = 2;
int nRows = 4;
int nCols = 4;
int n[2] = {nRows, nCols};
float* h_in = (float*)malloc(sizeof(float) * nRows*nCols*howMany);
for(int i=0; i<(nRows*nCols*howMany); i++){ //initialize h_in to [0 1 2 3 4 ...]
    h_in[i] = (float)i;
    printf("h_in[%d] = %f \n", i, h_in[i]);
}

FFTW计划

fftwf_plan forwardPlan = fftwf_plan_many_dft_r2c(2, //rank
                                                 n, //dimensions = {nRows, nCols}
                                                 howMany, //howmany
                                                 h_in, //in
                                                 0, //inembed
                                                 howMany, //istride
                                                 1, //idist
                                                 h_freq, //out
                                                 0, //onembed
                                                 howMany, //ostride
                                                 1, //odist
                                                 FFTW_PATIENT /*flags*/);

CUFFT计划

CHECK_CUFFT(cufftPlanMany(&forwardPlan,
              2, //rank
              n, //dimensions = {nRows, nCols}
              0, //inembed
              howMany, //istride
              1, //idist
              0, //onembed
              howMany, //ostride
              1, //odist
              CUFFT_R2C, //cufftType
              howMany /*batch*/));

结果

当我使用howMany=1时, CUFFT和FFTW结果匹配。但是,当我使用howMany=2istride = ostride = 2时,它变得更加混乱,因此两个FFT在内存中交错。当我将howMany从1更改为2时,CUFFT结果基本不变,但FFTW结果完全改变。我的预感是FFTW是正确的,CUFFT在这里是错误的。

FFTW,howMany = 2

fftw h_freq[0][0,1] = 240.000000,0.000000 
fftw h_freq[1][0,1] = 256.000000,0.000000 
fftw h_freq[2][0,1] = -16.000000,16.000000 
fftw h_freq[3][0,1] = -16.000000,16.000000 
fftw h_freq[4][0,1] = -16.000000,0.000000 
fftw h_freq[5][0,1] = -16.000000,0.000000 
fftw h_freq[6][0,1] = -64.000000,64.000000 
fftw h_freq[7][0,1] = -64.000000,64.000000 
fftw h_freq[8][0,1] = 0.000000,0.000000 
...
fftw h_freq[31][0,1] = 0.000000,0.000000 

CUFFT,howMany = 2

cufft h_freq[0].(x,y) = 120.000000,0.000001 
cufft h_freq[1].(x,y) = -8.000001,7.999996 
cufft h_freq[2].(x,y) = -8.000000,-0.000001 
cufft h_freq[3].(x,y) = -32.000000,32.000000 
cufft h_freq[4].(x,y) = 0.000000,-0.000000 
cufft h_freq[5].(x,y) = -0.000001,0.000001  
cufft h_freq[6].(x,y) = -32.000000,-0.000000 
cufft h_freq[7].(x,y) = 0.000000,0.000000  
cufft h_freq[8].(x,y) = -0.000000,0.000000 
...
cufft h_freq[31].(x,y) = 0.000000,0.000000 

可能导致这种差异的原因是什么? 我是否正确使用CUFFT批处理模式?


其他说明

  • 在FFTW版本中,我在设置FFTW计划后初始化h_in数据。这样,我的h_in数据在FFTW规划期间不会被覆盖。
  • 您可以通过下载我的代码来重现该问题:FFTW codeCUFFT codeall code

0 个答案:

没有答案