cuFFT循环FFT调用更大的批量大小

时间:2016-06-09 08:47:04

标签: cuda fft cufft

我目前正在尝试在循环中运行多个FFT,以克服cuFFT计划的最大1.28亿元素。因此,例如,我将循环运行1.28亿个元素。

我的程序适用于单个FFT调用,但循环似乎不起作用。我想也许是因为我如何抵消FFT。以下是我如何做到的片段:

cufftComplex *d_signal;
checkCudaErrors(cudaMalloc((void **)&d_signal, mem_size));
cufftComplex *d_filter_kernel;
checkCudaErrors(cudaMalloc((void **)&d_filter_kernel, mem_size));

int rankSize = 2;       
int rank[2];
    rank[0] = TempSearchSizeY; rank[1] = TempSearchSizeX;       
int FFTPlanSize = 500;
cufftHandle planinitial;
cufftResult r;
r = cufftPlanMany(&planinitial, rankSize, rank, NULL, 1, 0, NULL, 1, 0, CUFFT_C2C, FFTPlanSize);
int NrOfFFTRuns = ceil(loadsize / FFTPlanSize);
int FFTOffset = 0;

    checkCudaErrors(cudaMemcpy(d_signal, imageNew, sizeof(Complex)*TempSearchArea*loadsize, cudaMemcpyHostToDevice));
    checkCudaErrors(cudaMemcpy(d_filter_kernel, tempNew, sizeof(Complex)*TempSearchArea*loadsize, cudaMemcpyHostToDevice));


    for (int a = 0; a < NrOfFFTRuns; a++){
                FFTOffset = FFTPlanSize*a;
                r = cufftExecC2C(planinitial, (cufftComplex *)&d_signal[FFTOffset], (cufftComplex *)&d_signal[FFTOffset], CUFFT_FORWARD);
                PrintFFTPlanStatus(r);
                r = cufftExecC2C(planinitial, (cufftComplex *)&d_filter_kernel[FFTOffset], (cufftComplex *)&d_filter_kernel[FFTOffset], CUFFT_FORWARD);
                PrintFFTPlanStatus(r);
                cout << "Run inital" << endl;
    {

上面的代码返回错误的结果。有人可以帮我解决问题吗?

1 个答案:

答案 0 :(得分:1)

我自己想出来了。

我忘了将每批次的元素大小(TempSearchSizeY * TempSearchSizeX)乘以偏移值。它应该是

offset = a * element size * batch size. 

此案例仅包含

offset = a* batch size.