Question

我正在寻找一种方法来使用cufft.h一个CUDA工具包，它可以执行快速傅里叶变换的GPU并行化。

首先，我通过synaptic下载了cuda库和cufft。然后我使用了来自NVidia的cufft documentation的示例程序我的cuda库位于笔记本电脑上的/usr/local/cuda-9.0。

我添加了那些包括：

1 #include <iostream>
2 #include <cstdio>
3 #include "/usr/local/cuda-9.0/include/cuda.h"
4 #include "/usr/local/cuda-9.0/include/cuda_runtime_api.h"
5 #include "/usr/local/cuda-9.0/include/cufft.h"

我这样编译：

g++ -Wall main.cpp -o main

并为每个类似cuda的函数（cudaMalloc，cudaGetLastError等等）获取undefine引用错误

我对图书馆的实施还很年轻，我不明白我该怎么做才能正确地包含这个cuda-cufft库......

nvidia documentation谈论filename.cu，但我不知道这是关于什么......

感谢您的时间：）

n.b：我在阅读论坛后添加了cuda.h和cuda_runtime_api.h（我忘了它是什么）。显然，只有cuda_runtime_api.h是必要的（我试过没有cuda.h并得到相同的错误）。

Answer 1

这是一个完整的示例代码（没有做任何有用的事情）和一个示例g ++编译命令，它将正确编译和链接代码：

$ cat t1338.cpp
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <cuda_runtime.h>
#include <cufft.h>

int main() {
    size_t work_size;

    int fft_sz = 32;            // Size of each FFT
    int num_ffts = 1;         // How many FFTs to do

    cufftComplex *in_buf_h, *in_buf_d, *out_buf_d;

    // Allocate buffers on host and device
    in_buf_h = new cufftComplex[fft_sz*num_ffts];
    cudaMalloc(&in_buf_d, fft_sz*num_ffts*sizeof(cufftComplex));
    cudaMalloc(&out_buf_d, fft_sz*num_ffts*sizeof(cufftComplex));
    cudaMemset(out_buf_d, 0, fft_sz*num_ffts*sizeof(cufftComplex));
    // Fill input buffer with zeros and copy to device
    memset(in_buf_h, 0, fft_sz*num_ffts*sizeof(cufftComplex));
    cudaMemcpy(in_buf_d, in_buf_h, fft_sz*num_ffts*sizeof(cufftComplex), cudaMemcpyHostToDevice);

    // Plan num_ffts of size fft_sz
    cufftHandle plan;
    cufftCreate(&plan);
    cufftMakePlan1d(plan, fft_sz, CUFFT_C2C, num_ffts, &work_size);

    // Execute the plan. We don't actually care about values.
    cufftExecC2C(plan, in_buf_d, out_buf_d, CUFFT_FORWARD);

    // Sync the device to flush the output
    cudaDeviceSynchronize();

    return 0;
}
$ g++ t1338.cpp -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lcudart -lcufft
$

你的include语句可能没那么正常，但是我使用的格式是“在这个文件的标准路径上搜索”，然后我用

标识标准路径的附加内容

-I/usr/local/cuda/include

但是你的编译命令肯定缺少必要的链接装置。您需要使用-L指定库的位置（路径），然后指明要包含的特定库，它们既是CUDA运行时库（-lcudart）又是CUFFT库（{{ 1}}）：

-lcufft

CUDA工具包通常安装了示例代码，其中包含可以检查的样本Makefile，或者只是编译这些项目以查看典型的编译命令用法。

正如我所提到的，这个源代码不完整。它没有做任何有用的事情。它只是为了演示正确的编译行为。特别是，我省略了正确的错误检查，我建议您将其包含在实际代码中。

根据您的安装是否创建了符号链接，您可能需要将上述路径更改为：

-L/usr/local/cuda/lib64 -lcudart -lcufft

和

-I/usr/local/cuda-9.0/include

从CUDA

1 个答案: