当一次进行一次FFT时,我发现FFTW和CUFFT给出了可比较的数值结果。但是,当我使用批处理模式执行多个FFT时,我的FFTW和CUFFT结果看起来没什么相似之处。
我们举一个简单的例子......
设置
int howMany = 2;
int nRows = 4;
int nCols = 4;
int n[2] = {nRows, nCols};
float* h_in = (float*)malloc(sizeof(float) * nRows*nCols*howMany);
for(int i=0; i<(nRows*nCols*howMany); i++){ //initialize h_in to [0 1 2 3 4 ...]
h_in[i] = (float)i;
printf("h_in[%d] = %f \n", i, h_in[i]);
}
FFTW计划
fftwf_plan forwardPlan = fftwf_plan_many_dft_r2c(2, //rank
n, //dimensions = {nRows, nCols}
howMany, //howmany
h_in, //in
0, //inembed
howMany, //istride
1, //idist
h_freq, //out
0, //onembed
howMany, //ostride
1, //odist
FFTW_PATIENT /*flags*/);
CUFFT计划
CHECK_CUFFT(cufftPlanMany(&forwardPlan,
2, //rank
n, //dimensions = {nRows, nCols}
0, //inembed
howMany, //istride
1, //idist
0, //onembed
howMany, //ostride
1, //odist
CUFFT_R2C, //cufftType
howMany /*batch*/));
当我使用howMany=1
时, CUFFT和FFTW结果匹配。但是,当我使用howMany=2
和istride = ostride = 2
时,它变得更加混乱,因此两个FFT在内存中交错。当我将howMany
从1更改为2时,CUFFT结果基本不变,但FFTW结果完全改变。我的预感是FFTW是正确的,CUFFT在这里是错误的。
FFTW,howMany = 2
fftw h_freq[0][0,1] = 240.000000,0.000000
fftw h_freq[1][0,1] = 256.000000,0.000000
fftw h_freq[2][0,1] = -16.000000,16.000000
fftw h_freq[3][0,1] = -16.000000,16.000000
fftw h_freq[4][0,1] = -16.000000,0.000000
fftw h_freq[5][0,1] = -16.000000,0.000000
fftw h_freq[6][0,1] = -64.000000,64.000000
fftw h_freq[7][0,1] = -64.000000,64.000000
fftw h_freq[8][0,1] = 0.000000,0.000000
...
fftw h_freq[31][0,1] = 0.000000,0.000000
CUFFT,howMany = 2
cufft h_freq[0].(x,y) = 120.000000,0.000001
cufft h_freq[1].(x,y) = -8.000001,7.999996
cufft h_freq[2].(x,y) = -8.000000,-0.000001
cufft h_freq[3].(x,y) = -32.000000,32.000000
cufft h_freq[4].(x,y) = 0.000000,-0.000000
cufft h_freq[5].(x,y) = -0.000001,0.000001
cufft h_freq[6].(x,y) = -32.000000,-0.000000
cufft h_freq[7].(x,y) = 0.000000,0.000000
cufft h_freq[8].(x,y) = -0.000000,0.000000
...
cufft h_freq[31].(x,y) = 0.000000,0.000000
可能导致这种差异的原因是什么? 我是否正确使用CUFFT批处理模式?
其他说明
h_in
数据。这样,我的h_in
数据在FFTW规划期间不会被覆盖。