Question

这是我尝试优化GPU的功能。 gpu :: blur在这段代码中花了很长时间。当我运行此代码的正常CPU版本时，执行30个图像需要大约1.5秒（framesToProcess包含30个图像）。当我运行此代码（使用gpu :: functions和GpuMat）时，它需要超过30秒。如果我评论gpu :: blur line，只需0.5秒即可执行。请帮我查一下GPU版本的错误。

void getContourCenters(vector<gpu::GpuMat>  &framesToProcess, vector<pointI>& contourCenter)
{
    size_t j = 0;

    for (int i = 1; i < framesToProcess.size(); i++)
    {

            gpu::GpuMat tempDifferenceImage, tempThresholdImage, tempBlurredImage;
            vector< vector<Point> > contours;
            vector<Vec4i> hierarchy;
            Rect objectBoundingRectangle = Rect(0, 0, 0, 0);
            gpu::absdiff(framesToProcess[i - 1], framesToProcess[i], tempDifferenceImage);
            gpu::threshold(tempDifferenceImage, tempThresholdImage, SENSITIVITY_VALUE, 255, THRESH_BINARY);
            gpu::blur(tempThresholdImage, tempBlurredImage, Size(BLUR_SIZE, BLUR_SIZE));

            Mat contourImage( tempBlurredImage );
            findContours(contourImage, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);
            for (int k = 0; k < contours.size(); ++k)
            {
                    objectBoundingRectangle = boundingRect(contours[k]);
                    int xpos = objectBoundingRectangle.x + objectBoundingRectangle.width / 2;
                    int ypos = objectBoundingRectangle.y + objectBoundingRectangle.height / 2;
                    contourCenter.push_back(mp(xpos, ypos, j++));
            }
    }
}

BLUR_SIZE是值为50的常量。图像大小为992 X 1000，CV_8UC1类型图像。我正在使用Nvidia Tegra K1。这是代码的另一个版本：

    void getContourCenters(vector<Mat>  &framesToProcess, vector<pointI>& contourCenter)
{    
    size_t j = 0;       
    for (int i = 1; i < framesToProcess.size(); i++)
    {    
                    Mat tempDifferenceImage, tempThresholdImage;
                    vector< vector<Point> > contours;
                    vector<Vec4i> hierarchy;
                    Rect objectBoundingRectangle = Rect(0, 0, 0, 0);
                    absdiff(framesToProcess[i - 1], framesToProcess[i], tempDifferenceImage);
                    threshold(tempDifferenceImage, tempThresholdImage, SENSITIVITY_VALUE, 255, THRESH_BINARY);
                    blur(tempThresholdImage, tempThresholdImage, Size(BLUR_SIZE, BLUR_SIZE));
                    findContours(tempThresholdImage, contours, hierarchy, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE);

                    for (int k = 0; k < contours.size(); ++k)
                    {
                            objectBoundingRectangle = boundingRect(contours[k]);
                            int xpos = objectBoundingRectangle.x + objectBoundingRectangle.width / 2;
                            int ypos = objectBoundingRectangle.y + objectBoundingRectangle.height / 2;
                            contourCenter.push_back(mp(xpos, ypos, j++));
                    }           

    }

}

此代码需要1.5秒才能在同一台计算机上执行。我想为GPU优化这段代码并编写上面提到的版本，这需要超过30秒。

Answer 1

正如我在评论中已经说明的那样，OpenCV已经优化了Tegra支持（如果使用了opencv4tegra库），所以很可能你的OpenCV GPU功能实际上不是“慢”但实际上你的“CPU”版本非常快，因为它在内部调用了一些tegra优化函数（而不是使用CPU）。

从http://on-demand.gputechconf.com/gtc/2013/presentations/S3411-OpenCV-For-Tegra.pdf您可以看到opencv4tegra中有多项优化，包括

NEON SIMD说明
GLSL算法，使用opengl着色器加快速度
Tegra硬件优化

都使用相同的已知“CPU”API。

其他详细信息可在http://docs.opencv.org/opencv2refman-tegra.pdf

在当前OpenCV版本中针对Tegra平台进行了优化的OpenCV功能列表。优化涵盖了下面为每个功能指定的最流行的数据类型和操作模式。什么时候优化在数据类型或模式上调用函数不在优化范围内，调用原始实现。

CV :: absdiff
CV ::添加
CV :: addWeighted
CV :: bitwise_and
CV :: bitwise_not
CV :: bitwise_or
CV :: bitwise_xor
CV ::比较
CV :: countNonZero
CV ::垫::点
CV :: INRANGE
CV ::最大
CV ::平均
CV :: meanStdDev
CV ::合并
CV ::分钟
CV :: minMaxLoc
CV ::相
CV ::减少
CV ::分割
CV :: subtrac
CV ::总和
CV ::垫::的ConvertTo
CV ::模糊
CV :: boxFilter
CV ::坎尼
CV :: cvtColor
CV ::扩张
CV ::侵蚀
CV :: filter2D
的 CV ::高斯模糊

CV ::积分

CV :: matchTemplate

CV :: medianBlur

CV :: pyrDown

CV :: pyrUp

CV ::调整大小

CV :: Scharr

CV ::索贝尔

CV ::阈

CV :: warpAffine

CV :: warpPerspective

CV :: FAST

CV :: calcOpticalFlowPyrLK

CV :: buildOpticalFlowPyramid

CV ::详细:: createLaplacePyr

CV ::详细:: normalizeUsingWeightMap

CV ::详细:: BestOf2NearestMatcher ::匹配

CV :: findCirclesGrid

在文档中标记了哪些功能是GPU驱动的

gpu :: blur函数需要更多时间

1 个答案: