Question

我有以下CUDA程序，它将RGBA中的图像并行转换为灰度。我想要一个可以顺序运行的版本，这样我就可以比较两者并获得加速等指标。

根据我的理解，为了使其顺序运行，我需要以一种方式进行编辑，这意味着图像使用两个for循环逐个像素地逐步进行（一个用于X，一个用于Y）。然后应该在像素上运行灰度转换，然后再转移到下一个像素上。

虽然我对自己应该做的事情有所了解，但我并不确定应该在哪里编辑代码以及从哪里开始。

编辑：我现在明白，为了使程序顺序，我需要编辑内核本身。

如下所示，

 __global__ void colorConvert(unsigned char * grayImage, unsigned char * rgbImage, unsigned int width, unsigned int height)
{
    unsigned int x = threadIdx.x + blockIdx.x * blockDim.x;
    //unsigned int y = threadIdx.y + blockIdx.y * blockDim.y; //this is needed if you use 2D grid and blocks
    //if ((x < width) && (y < height)) {
    //check if out of bounds
    if ((x < width*height)) {
        // get 1D coordinate for the grayscale image
        unsigned int grayOffset = x;// y*width + x; //this is needed if you use 2D grid and blocks
        // one can think of the RGB image having
        // CHANNEL times columns than the gray scale image
        unsigned int rgbOffset = grayOffset*CHANNELS;
        unsigned char r = rgbImage[rgbOffset]; // red value for pixel
        unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
        unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
        // perform the rescaling and store it
        // We multiply by floating point constants
        grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;
    }
}

我已经从问题中移除了我的其余代码，因为有很多内容都是通过查看。如果我想让这个内核以顺序方式运行，使用两个for循环逐步执行每个像素并将grayImage[grayOffset]代码行应用于每个像素，我将如何进行呢？

Answer 1

你需要一个for循环，使用你的代码为所有图像像素使用一维数组，所以你只需要一个。

我认为循环可以这样写，在一个与内核具有相同参数的函数中

for(x=0; x<width*height; ++x)
{
    unsigned int grayOffset = x;
    unsigned int rgbOffset = grayOffset*CHANNELS;
    unsigned char r = rgbImage[rgbOffset]; // red value for pixel
    unsigned char g = rgbImage[rgbOffset + 1]; // green value for pixel
    unsigned char b = rgbImage[rgbOffset + 2]; // blue value for pixel
    // perform the rescaling and store it
    // We multiply by floating point constants
    grayImage[grayOffset] = 0.21f*r + 0.71f*g + 0.07f*b;

}

将parralel CUDA程序转换为顺序运行

1 个答案: