Question

所以我有关于卷积的这项工作，我必须在其中将.wav过滤器应用于另一个.wav文件。我必须使用CUDA进行此操作。这是我的CUDA内核：


__global__ void MyConvolveCUDA(const double* A, const double* B, double* C, int n, int m) {

    int i = threadIdx.x + blockIdx.x * blockDim.x;
    int j = threadIdx.y + blockIdx.y * blockDim.y;

    int min, max;
    if (i >= m - 1) min = i - m + 1; else min = 0;
    if (i < n - 1) max = i; else max = n - 1;

    if (j <= min) j = min;
    else if (j >= max) j = max;

    C[i] = A[i] * B[j - i];
}

这是我尝试的功能。我使用了一个自定义的lib来读取音频文件（它们可以正确读取并且一切正常），所以我将简化代码中音频文件的一部分：


void MyConvolveCUDA_Run() {

    //Let's say that  'filter' is the filter i want to apply to the 'audio' file. 'output' is the file I 
    //want to export in the end. The '.samples' function accesses the samples' part of the audio file, 
    //and the 'save' function saves the file using the given name.

    int n = audio.samples.size(),
        m = filter.samples.size();

    //These are the device copies of the data I want to proccess.
    double* audioCUDA = nullptr;
    double* filterCUDA = nullptr;
    double* outputCUDA = nullptr;

    cudaMalloc((void **)&audioCUDA, n * sizeof(double));
    cudaMalloc((void **)&filterCUDA, n * sizeof(double));
    cudaMalloc((void **)&outputCUDA, (n + m - 1) * sizeof(double));

    cudaMemcpy(audioCUDA, audio.samples[0].data(), n * sizeof(double), cudaMemcpyHostToDevice);
    cudaMemcpy(filterCUDA, filter.samples[0].data(), m * sizeof(double), cudaMemcpyHostToDevice);

    MyConvolveCUDA << < 32, 32 >> > (audioCUDA, filterCUDA, outputCUDA, n, m);
    cudaDeviceSynchronize();

    cudaMemcpy(output.samples[0].data(), outputCUDA, (n + m - 1) * sizeof(double), cudaMemcpyDeviceToHost);

    cudaFree(audioCUDA); cudaFree(filterCUDA); cudaFree(outputCUDA);

    output.save("CUDA_output.wav");
}

您能了解出什么问题了吗？我想检查传递给MyConvolveCUDA的数组，但是每次尝试时都会出现访问冲突错误。

谢谢！

Answer 1

您将以api.LoginManager启动cuda内核MyConvolveCUDA，这意味着您正在启动32个块，每个块具有32个线程（1024个线程）。在内核中，您正在使用2D线程索引，但是仅启动了1D线程。

<table> <tr> <th style="width:0px"> <div class="cell-wrapper">Some super long content</div> </th> <th> <div class="cell-wrapper">More content, could be an image</div> </th> </tr> </table> .cell-wrapper { position: absolute; }解释为MyConvolveCUDA<<<32,32>>>，其中M是块数，N是每个内核的线程数，即；我们仅在x方向上启动线程。为此，threadIdx.y和blockIdx.y将始终为0。

如果要以二维方式启动它，则应将内核称为MyConvolveCUDA<<<M,N>>>。

要检查内核中的数组，可以像这样打印它们

MyConvolveCUDA<<<dim3(M,1,1),dim3(M,1,1)>>>

如何将向量的数据传递到CUDA内核？

1 个答案: