Question

我写了一个小程序，用它可以获取数字图像的边缘（著名的Canny检测器）。有必要测量在设备（GPU）上执行算法的准确时间（以毫秒为单位）（包括数据传输的阶段）。我将工作程序代码附加到C：

#include <iostream>
#include <sys/time.h>
#include <opencv2/core.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <cuda_runtime.h>
#include <opencv2/core/cuda.hpp>
using namespace cv;
using namespace std;


__device__ __host__
void FirstRun (void)
{
    cudaSetDevice(0);
    cudaEvent_t start, stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
}

int main( int argc, char** argv )
{
    clock_t time;
    if (argc != 2) 
    {
        cout << "Wrong number of arguments!" << endl;
        return -1;
    }
    const char* filename = argv[1];
    Mat img = imread(filename, IMREAD_GRAYSCALE);
    if( !img.data )
    { 
        cout << " --(!) Error reading images \n" << endl;
        return -2; 
    }

    double low_tresh = 100.0;
    double high_tresh = 150.0;
    int apperture_size = 3;
    bool useL2gradient = false;

    int imageWidth = img.cols;  
    int imageHeight = img.rows; 
    cout << "Width of image: " << imageWidth  << endl;
    cout << "Height of image: " << imageHeight << endl;
    cout << endl;

    FirstRun();

    // Canny algorithm
    cuda::GpuMat d_img(img);
    cuda::GpuMat d_edges;

    time = clock();
    Ptr<cuda::CannyEdgeDetector> canny = cuda::createCannyEdgeDetector(low_tresh, high_tresh, apperture_size, useL2gradient);
    canny->detect(d_img, d_edges);
    time = clock() - time;
    cout << "CannyCUDA time (ms): " << (float)time / CLOCKS_PER_SEC * 1000  << endl;
    return 0;
}

我有两个不同的工作时间（图像7741 x 8862）

系统配置：

1）CPU：Intel Core i7 9600K（3.6 GHz），32 GB RAM；

2）GPU：Nvidia Geforce RTX 2080 Ti；

3）OpenCV版本。 4.0

什么时候是正确的，我能正确地测量它，谢谢！

Answer 1

在处理cuda时，您可以测量不同的时间。

以下是您可能要尝试的一些解决方案：

测量cuda使用的总时间：使用time（）获取绝对时间值，然后再使用任何cuda函数，并在获得结果后再次使用time（）。区别在于所经过的实时时间。
仅测量计算时间：cuda有一些启动开销，但是如果您对此不感兴趣，因为您将在不退出cuda环境的情况下多次使用代码，因此可以单独测量。请阅读the CUDA C Programming Guide，它将说明如何将事件用于计时。
使用事件探查器来获取有关程序的哪部分花费时间的详细信息：内核时间特别有趣，因为它们告诉您计算需要多长时间。查看API时间时要小心。在您的示例中，cudaEventCreate（）使用了大量时间，因为这是程序中的第一个cuda函数，因此它包括启动开销。另外，cuda [...] Synchronize（）实际上并不需要花费那么长的时间，但是它包含等待同步的时间。

正确的CUDA程序执行时间

1 个答案: