关于袖口R2C和C2R

时间:2017-01-16 04:58:46

标签: c++ cuda cufft

我已经使用袖带进行研究,但是使用它会有一些问题。我的步骤如下:

  1. 使用R2C
  2. 对图像进行前向FFT
  3. 将核系数与复杂结果相乘
  4. 使用C2R
  5. 对乘法结果进行逆FFT

    但是,当我使用复杂的结果来增加内核时,发生了一个严重的问题,袖口复数结果不等于fftw的结果,并且结果中有很多零。我知道R2C的结果大小是N1(N2 / 2 + 1),但我希望得到完整复杂的结果。如何解决这个问题呢?即如何恢复R2C结果?如何将乘法结果放入C2R并获得正确的答案?

    我的实施程序代码如下:

    __global__ void MultiplyKernel(cufftComplex *data, float *data1,cufftComplex *data2, unsigned vectorSize) {
        unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
        if (idx < vectorSize){
            data[idx].x = data2[idx].x*data1[idx];
            data[idx].y = data2[idx].y*data1[idx];
        }
    }
    
    __global__ void Scale(cufftReal *data, unsigned vectorSize) {
        unsigned idx = blockIdx.x*blockDim.x+threadIdx.x;
        if (idx < vectorSize){
            data[idx] = data[idx]/vectorSize;
        }
    }
    
    void ApplyKernel1(cufftReal *data2, float *ImageBuffer, float *KernelBuffer, unsigned int NX, unsigned int NY,unsigned int NZ)
    {
          float *Akernel;
          cufftComplex *data_dev1, *data_dev2;
          cufftReal *data_dev3, *data_dev;
          cudaMalloc((void **)&Akernel, NX * NY * NZ * sizeof(float));
          cudaMalloc((void **)&data_dev3, NX * NY * NZ * sizeof(cufftReal));
          cudaMalloc((void **)&data_dev, NX * NY * NZ * sizeof(cufftComplex));
          cudaMalloc((void **)&data_dev1, NX * NY * NZ * sizeof(cufftComplex));
          cudaMalloc((void **)&data_dev2, NX * NY * NZ * sizeof(cufftComplex));
          cudaMemset(data_dev, 0, NX * NY * NZ * sizeof(cufftReal));
          cudaMemset(data_dev1, 0, NX * NY * NZ * sizeof(cufftComplex));
          cudaMemset(data_dev2, 0, NX * NY * NZ * sizeof(cufftComplex));
          //cufftComplex *resultFFT = (cufftComplex*)malloc(NX * NY * NZ * sizeof(cufftComplex));
          //cufftReal *resultIFFT = (cufftReal*)malloc(NX * NY * NZ * sizeof(cufftReal));
    
          cudaMemcpy(data_dev, ImageBuffer, NX * NY * NZ * sizeof(cufftReal), cudaMemcpyHostToDevice);
    
          cufftHandle plan;
          cufftPlan3d(&plan, NZ, NY, NX, CUFFT_R2C);
          cufftExecR2C(plan, data_dev, data_dev1);
    
          //Multiply kernel
          cudaMemcpy(Akernel, KernelBuffer, NX * NY * NZ * sizeof(float), cudaMemcpyHostToDevice);
          static const int BLOCK_SIZE = 1000;
          const int blockCount = (NX*NY*NZ+BLOCK_SIZE-1)/BLOCK_SIZE;
          MultiplyKernel <<<blockCount, BLOCK_SIZE>>> (data_dev2, Akernel, data_dev1, NX*NY*NZ);
    
    
          cufftDestroy(plan);
          //cufftPlan3d(&plan, NZ, NY, NX, CUFFT_C2R);
          cufftPlan3d(&plan, NZ,NY,NX, CUFFT_C2R);
          cufftExecC2R(plan, data_dev2, data_dev3 );
          Scale <<<blockCount, BLOCK_SIZE>>> (data_dev3, NX*NY*NZ);
          cudaMemcpy(data2, data_dev3, NZ * NY * NX * sizeof(cufftReal), cudaMemcpyDeviceToHost);
    
          cufftDestroy(plan);
          cudaFree(data_dev);
          cudaFree(data_dev1);
          cudaFree(data_dev2);
          cudaFree(data_dev3);
          cudaFree(Akernel);
    
    
    }
    

1 个答案:

答案 0 :(得分:1)

当您将R2C fft的结果乘以复数时,结果不再对应于对称数组。