OpenMP与矢量矢量并行

时间:2018-01-12 15:35:45

标签: c++ vector openmp

我有一个大小为W x H的固定大小的二维矩阵,矩阵中的每个元素都是一个std :: vector。数据存储在具有线性化索引的矢量矢量中。我正试图找到一种同时填充输出向量的方法。这是一些代码,用于表明我正在尝试做什么。

#include <cmath>
#include <chrono>
#include <iostream>
#include <mutex>
#include <vector>
#include <omp.h>

struct Vector2d
{
    double x;
    double y;
};

double generate(double range_min, double range_max)
{
    double val = (double)rand() / RAND_MAX;
    return range_min + val * (range_max - range_min);
}

int main(int argc, char** argv)
{
    (void)argc;
    (void)argv;

    // generate input data
    std::vector<Vector2d> points;
    size_t num = 10000000;
    size_t w = 100;
    size_t h = 100;

    for (size_t i = 0; i < num; ++i)
    {
        Vector2d point;
        point.x = generate(0, w);
        point.y = generate(0, h);
        points.push_back(point);
    }

    // output
    std::vector<std::vector<Vector2d> > output(num, std::vector<Vector2d>());
    std::mutex mutex;

    auto start = std::chrono::system_clock::now();

    #pragma omp parallel for
    for (size_t i = 0; i < num; ++i)
    {
        const Vector2d point = points[i];
        size_t x = std::floor(point.x);
        size_t y = std::floor(point.y);
        size_t id = y * w + x;
        mutex.lock();
        output[id].push_back(point);
        mutex.unlock();
    }

    auto end = std::chrono::system_clock::now();
    std::chrono::duration<double> elapsed_seconds = end - start;
    std::cout << "elapsed time: " << elapsed_seconds.count() << "s\n";

    return 0;
}

问题是启用openmp时代码要慢得多。我找到了一些使用简化填充std :: vector的例子,但我不知道如何使它适应矢量的向量。任何帮助都很感激,谢谢!

1 个答案:

答案 0 :(得分:0)

您可以采取一些措施来改善效果:

我会预先分配持有Vector2d课程的第二个向量,因为每次push_back超出Vector2d的新std::vectorVector2d ,它将重新分配。因此,如果您不关心在std::vector中初始化std::vector<std::vector<Vector2d> > output(num, std::vector<Vector2d>(num, Vector2d(/*whatever goes in here*/))); ,我只会使用:

operator[]

然后在你的for循环中,你可以通过#pragma omp parallel for for (size_t i = 0; i < num; ++i) { const Vector2d point = points[i]; size_t x = std::floor(point(0)); size_t y = std::floor(point(1)); size_t id = y * w + x; output[id][i] = num; } 访问第二个向量中的元素,这样你就可以摆脱锁定。

std::vector<Vector2d>

虽然我不确定,但前面提到的方式适用于你想做的事情。否则,您可以capacity为每个std::vector<std::vector<Vector2d> > output(num, std::vector<Vector2d>()); for(int i = 0; i < num; ++i) { output[i].reserve(num); } #pragma omp parallel for for (size_t i = 0; i < num; ++i) { const Vector2d point = points[i]; size_t x = std::floor(point(0)); size_t y = std::floor(point(1)); size_t id = y * w + x; mutex.lock(); output[id].push_back(point); mutex.unlock(); } 设置存储空间,这样您就可以使用初始循环:

self.list_box.delete(0, END)
for i in range(len(values)):
    self.list_box.insert(END, values[i])

这意味着你摆脱了向量重新分配,但你仍然有互斥...