所以我有关于卷积的这项工作,我必须在其中将.wav过滤器应用于另一个.wav文件。我必须使用CUDA进行此操作。这是我的CUDA内核:
__global__ void MyConvolveCUDA(const double* A, const double* B, double* C, int n, int m) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
int j = threadIdx.y + blockIdx.y * blockDim.y;
int min, max;
if (i >= m - 1) min = i - m + 1; else min = 0;
if (i < n - 1) max = i; else max = n - 1;
if (j <= min) j = min;
else if (j >= max) j = max;
C[i] = A[i] * B[j - i];
}
这是我尝试的功能。我使用了一个自定义的lib来读取音频文件(它们可以正确读取并且一切正常),所以我将简化代码中音频文件的一部分:
void MyConvolveCUDA_Run() {
//Let's say that 'filter' is the filter i want to apply to the 'audio' file. 'output' is the file I
//want to export in the end. The '.samples' function accesses the samples' part of the audio file,
//and the 'save' function saves the file using the given name.
int n = audio.samples.size(),
m = filter.samples.size();
//These are the device copies of the data I want to proccess.
double* audioCUDA = nullptr;
double* filterCUDA = nullptr;
double* outputCUDA = nullptr;
cudaMalloc((void **)&audioCUDA, n * sizeof(double));
cudaMalloc((void **)&filterCUDA, n * sizeof(double));
cudaMalloc((void **)&outputCUDA, (n + m - 1) * sizeof(double));
cudaMemcpy(audioCUDA, audio.samples[0].data(), n * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(filterCUDA, filter.samples[0].data(), m * sizeof(double), cudaMemcpyHostToDevice);
MyConvolveCUDA << < 32, 32 >> > (audioCUDA, filterCUDA, outputCUDA, n, m);
cudaDeviceSynchronize();
cudaMemcpy(output.samples[0].data(), outputCUDA, (n + m - 1) * sizeof(double), cudaMemcpyDeviceToHost);
cudaFree(audioCUDA); cudaFree(filterCUDA); cudaFree(outputCUDA);
output.save("CUDA_output.wav");
}
您能了解出什么问题了吗?我想检查传递给MyConvolveCUDA的数组,但是每次尝试时都会出现访问冲突错误。
谢谢!
答案 0 :(得分:0)
您将以api.LoginManager
启动cuda内核MyConvolveCUDA,这意味着您正在启动32个块,每个块具有32个线程(1024个线程)。在内核中,您正在使用2D线程索引,但是仅启动了1D线程。
<table>
<tr>
<th style="width:0px">
<div class="cell-wrapper">Some super long content</div>
</th>
<th>
<div class="cell-wrapper">More content, could be an image</div>
</th>
</tr>
</table>
.cell-wrapper
{
position: absolute;
}
解释为MyConvolveCUDA<<<32,32>>>
,其中M是块数,N是每个内核的线程数,即;我们仅在x方向上启动线程。为此,threadIdx.y和blockIdx.y将始终为0。
如果要以二维方式启动它,则应将内核称为MyConvolveCUDA<<<M,N>>>
。
要检查内核中的数组,可以像这样打印它们
MyConvolveCUDA<<<dim3(M,1,1),dim3(M,1,1)>>>