Question

我有这个结构：

    struct CacheElem{
        CacheElem(const cv::Mat1f &code, const std::string &result) : code(code), result(result) {}
        cv::Mat1f code;
        std::string result;
    };

然后我根据以下Compare结构定义了这个自定义缩减：

    struct Compare{
        Compare(double val = std::numeric_limits<double>::max(), size_t index = 0) : val(val), index(index) {}
        double val;
        size_t index;
    };

然后在此函数中使用：

    #pragma omp parallel
    {
        Compare localMin;
        #pragma omp for reduction(minimum:min) schedule(dynamic,1)
        for(size_t i=0; i<values.size(); i++){
            D d = distance(queryCode, values[i].code);
            if(d < localMin.val){
                localMin.val = d;
                localMin.index = i;
            }
        }
        #pragma omp critical
        if(localMin.val < min.val)
            min = localMin;
    }

cv::norm计算两个行向量之间的欧氏距离。

非常奇怪的是，此代码比基于reduction的方法提供了更好的结果（参见之前的编辑）

在values中使用100.000个元素时，每个cv::Mat1f大小为1x4096，在Intel（R）Core（TM）i7-4700MQ CPU @ 2.40GHz上，每个QueryCache调用（并且每次运行称为100次）使用8个线程需要大约0.127158秒，而使用1个线程需要0.705878，这并不可怕，但我确信它可以改进。不幸的是，我通常用于测试的6台核心机器现在无法使用。

这就是我使用cc::stopTimer()和cc::startTimer()来衡量时间的方法：

float timeElapsed(const struct timeval &start){
    struct timeval end;
    float mtime, seconds, useconds;

    gettimeofday(&end, NULL);

    seconds  = end.tv_sec  - start.tv_sec;
    useconds = end.tv_usec - start.tv_usec;

    mtime = seconds + useconds/1000000.0;

    return mtime;

}

struct timeval startTimer(){
    struct timeval start;
    gettimeofday(&start, NULL);
    return start;
}
float stopTimer(const struct timeval &start, const std::string &label){
    float time = timeElapsed(start);
    if(!label.empty())
        std::cout<<label<<" time "<<time<<" seconds"<<std::endl;
    return time;
}

totalLookupTime只是一个类参数，用于计算计算QueryCache所有不同调用之间的平行区域的平均时间。

使用OpenMP在大型代码上平行最小欧氏距离？

0 个答案: