我在处理低水平的图像处理方面非常陌生,并且刚刚开始实现同时具有GPU和CPU的高斯内核 - 但两者都产生相同的输出,图像严重偏斜网格:
我知道我可以使用OpenCV预先构建的函数来处理过滤器,但我想学习它背后的方法,所以我自己构建了。
卷积内核:
// Convolution kernel - this manipulates the given channel and writes out a new blurred channel.
void convoluteChannel_cpu(
const unsigned char* const channel, // Input channel
unsigned char* const channelBlurred, // Output channel
const size_t numRows, const size_t numCols, // Channel width/height (rows, cols)
const float *filter, // The weight of sigma, to convulge
const int filterWidth // This is normally a sample of 9
)
{
// Loop through the images given R, G or B channel
for(int rows = 0; rows < (int)numRows; rows++)
{
for(int cols = 0; cols < (int)numCols; cols++)
{
// Declare new pixel colour value
float newColor = 0.f;
// Loop for every row along the stencil size (3x3 matrix)
for(int filter_x = -filterWidth/2; filter_x <= filterWidth/2; filter_x++)
{
// Loop for every col along the stencil size (3x3 matrix)
for(int filter_y = -filterWidth/2; filter_y <= filterWidth/2; filter_y++)
{
// Clamp to the boundary of the image to ensure we don't access a null index.
int image_x = __min(__max(rows + filter_x, 0), static_cast<int>(numRows -1));
int image_y = __min(__max(cols + filter_y, 0), static_cast<int>(numCols -1));
// Assign the new pixel value to the current pixel, numCols and numRows are both 3, so we only
// need to use one to find the current pixel index (similar to how we find the thread in a block)
float pixel = static_cast<float>(channel[image_x * numCols + image_y]);
// Sigma is the new weight to apply to the image, we perform the equation to get a radnom weighting,
// if we don't do this the image will become choppy.
float sigma = filter[(filter_x + filterWidth / 2) * filterWidth + filter_y + filterWidth/2];
//float sigma = 1 / 81.f;
// Set the new pixel value
newColor += pixel * sigma;
}
}
// Set the value of the next pixel at the current image index with the newly declared color
channelBlurred[rows * numCols + cols] = newColor;
}
}
}
我用另一种将图像分成相应的R,G,B通道的方法称之为3次,但我不相信这会导致图像发生如此严重的变异。
有没有人遇到过类似的问题,如果有的话,你是怎么解决的?
编辑频道拆分功能:
void gaussian_cpu(
const uchar4* const rgbaImage, // Our input image from the camera
uchar4* const outputImage, // The image we are writing back for display
size_t numRows, size_t numCols, // Width and Height of the input image (rows/cols)
const float* const filter, // The value of sigma
const int filterWidth // The size of the stencil (3x3) 9
)
{
// Build an array to hold each channel for the given image
unsigned char *r_c = new unsigned char[numRows * numCols];
unsigned char *g_c = new unsigned char[numRows * numCols];
unsigned char *b_c = new unsigned char[numRows * numCols];
// Build arrays for each of the output (blurred) channels
unsigned char *r_bc = new unsigned char[numRows * numCols];
unsigned char *g_bc = new unsigned char[numRows * numCols];
unsigned char *b_bc = new unsigned char[numRows * numCols];
// Separate the image into R,G,B channels
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = rgbaImage[i];
r_c[i] = rgba.x;
g_c[i] = rgba.y;
b_c[i] = rgba.z;
}
// Convolute each of the channels using our array
convoluteChannel_cpu(r_c, r_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(g_c, g_bc, numRows, numCols, filter, filterWidth);
convoluteChannel_cpu(b_c, b_bc, numRows, numCols, filter, filterWidth);
// Recombine the channels to build the output image - 255 for alpha as we want 0 transparency
for(size_t i = 0; i < numRows * numCols; i++)
{
uchar4 rgba = make_uchar4(r_bc[i], g_bc[i], b_bc[i], 255);
outputImage[i] = rgba;
}
}
编辑调用内核
while(gpu_frames > 0)
{
//cout << gpu_frames << "\n";
camera >> frameIn;
// Allocate I/O Pointers
beginStream(&h_inputFrame, &h_outputFrame, &d_inputFrame, &d_outputFrame, &d_redBlurred, &d_greenBlurred, &d_blueBlurred, &_h_filter, &filterWidth, frameIn);
// Show the source image
imshow("Source", frameIn);
g_timer.Start();
// Allocate mem to GPU
allocateMemoryAndCopyToGPU(numRows(), numCols(), _h_filter, filterWidth);
// Apply the gaussian kernel filter and then free any memory ready for the next iteration
gaussian_gpu(h_inputFrame, d_inputFrame, d_outputFrame, numRows(), numCols(), d_redBlurred, d_greenBlurred, d_blueBlurred, filterWidth);
// Output the blurred image
cudaMemcpy(h_outputFrame, d_frameOut, sizeof(uchar4) * numPixels(), cudaMemcpyDeviceToHost);
g_timer.Stop();
cudaDeviceSynchronize();
gpuTime += g_timer.Elapsed();
cout << "Time for this kernel " << g_timer.Elapsed() << "\n";
Mat outputFrame(Size(numCols(), numRows()), CV_8UC1, h_outputFrame, Mat::AUTO_STEP);
clean_mem();
imshow("Dest", outputFrame);
// 1ms delay to prevent system from being interrupted whilst drawing the new frame
waitKey(1);
gpu_frames--;
}
然后在beginStream()方法中,图像被转换为uchar4:
// Allocate host variables, casting the frameIn and frameOut vars to uchar4 elements, these will
// later be processed by the kernel
*h_inputFrame = (uchar4 *)frameIn.ptr<unsigned char>(0);
*h_outputFrame = (uchar4 *)frameOut.ptr<unsigned char>(0);
答案 0 :(得分:1)
这个问题有很多疑点。 在代码的开头,它提到过滤器宽度为9,因此使其成为9x9内核。但是在其他一些评论中它说是3.所以我猜你实际上是在使用9x9内核而且过滤器确实有81个权重。
但上述输出绝不会是由于上述混淆造成的。
uchar4的大小为4字节。因此,在gaussian_cpu中,通过在不包含alpha值的图像上运行rgbaImage [i]上的循环来分割数据(可以从上面提到的循环中推断出alpha不存在)实际上完成的是你正在复制R1 ,G2,B3,R5,G6,B7等到红色通道。最好先尝试灰度图像上的代码,并确保使用的是uchar而不是uchar4。
输出图像看起来恰好是原始图像宽度的1/3,这使得上述假设成立。
编辑1:
输入rgbaImage到guassian_cpu函数是RGBA还是RGB? videoCapture必须提供3通道输出。 *h_inputFrame
(到uchar4)本身的初始化是错误的,因为它指向3通道数据。
类似地,输出数据是四个通道数据,但Mat outputFrame
被声明为指向这四个通道数据的单个通道。尝试将mat输出框架作为8UC3 type
并查看结果。
另外,代码如何工作,guassian_cpu()函数在定义中有7个输入参数,但是当你调用函数时,使用8个参数。希望这只是一个错字。