Udacity并行编程,未指定的启动失败cudaGetLastError()

时间:2014-09-21 20:56:08

标签: cuda

我正在尝试为Udacity课程并行编程完成作业#2。我遇到了一个我无法解决的CUDA错误。当我启动一个旨在将格式为“RGBRGBRGB”的图像分离为“RRR”“GGG”和“BBB”的三个独立数组的内核时,该错误就被限制了。看到错误“未指定的启动失败”并没有给我任何具体的东西继续我不知道如何解决我的问题。

这是调用启动整个过程的“main”函数。在遇到错误后我遗漏了剩下的部分,这样我就不会将其余的工作发布给以后找人了。

void your_gaussian_blur(const uchar4 * const h_inputImageRGBA, uchar4 * const d_inputImageRGBA, uchar4* const d_outputImageRGBA, const size_t numRows, const size_t numCols,
                        unsigned char *d_redBlurred, 
                        unsigned char *d_greenBlurred, 
                        unsigned char *d_blueBlurred,
                        const int filterWidth)
{

    // Maximum number of threads per block = 512; do this 
    // to keep this compatable with CUDa 5 and lower
    // MAX > threadsX * threadsY * threadsZ
    int MAXTHREADSx = 16;
    int MAXTHREADSy = 16; // 16 x 16 x 1 = 512
    // We want to fill the blocks so we don't waste this blocks threads
    // I wonder if blocks can intermix in a physical core? 
    // Either way this method makes things "clean"; one thread per px
    int nBlockX = numCols / MAXTHREADSx + 1;
    int nBlockY = numRows / MAXTHREADSy + 1;

    const dim3 blockSize(MAXTHREADSx, MAXTHREADSy, 1);
    const dim3 gridSize(nBlockX, nBlockY, 1);

    separateChannels<<<gridSize, blockSize>>>(
        h_inputImageRGBA,
        numRows,
        numCols,
        d_red,
        d_green,
        d_blue);

  // Call cudaDeviceSynchronize(), then call checkCudaErrors() immediately after
  // launching your kernel to make sure that you didn't make any mistakes.
  cudaDeviceSynchronize(); checkCudaErrors(cudaGetLastError());

这是函数separateChannels

//This kernel takes in an image represented as a uchar4 and splits
//it into three images consisting of only one color channel each
__global__
void separateChannels(const uchar4* const inputImageRGBA,
                                int numRows,
                                int numCols,
                                unsigned char* const redChannel,
                                unsigned char* const greenChannel,
                                unsigned char* const blueChannel)
{
    //const int2 thread_2D_pos = make_int2(blockIdx.x * blockDim.x + threadIdx.x, blockIdx.y * blockDim.y + threadIdx.y);
    const int col = blockIdx.x * blockDim.x + threadIdx.x;
    const int row = blockIdx.y * blockDim.y + threadIdx.y;

    //if (thread_2D_pos.x >= numCols || thread_2D_pos.y >= numRows)
    //  return;
    if (col >= numCols || row >= numRows)
        return;

    //const int thread_1D_pos = thread_2D_pos.y * numCols + thread_2D_pos.x;
    int arrayPos = row * numCols + col;

    uchar4 rgba = inputImageRGBA[arrayPos];
    redChannel[arrayPos] = rgba.x;
    greenChannel[arrayPos] = rgba.y;
    blueChannel[arrayPos] = rgba.z;
}

我想我已经填写了必要的内容,如果没有,请告诉我。

1 个答案:

答案 0 :(得分:2)

没有看到其他代码我无法确定,但我相信你发送指向主机内存的指针作为cuda内核的参数 - 这不是一件好事。在内核启动时,您发送h_inputImageRGBA,而我相信您要发送d_inputImageRGBA

通常h_前缀代表主机内存,而d_代表设备。