为什么使用随机数生成的OpenMP比串行代码慢

时间:2014-09-15 11:35:24

标签: c++ random openmp

我正在尝试使用OpenMP为我的程序添加并行性。

std::random_device rd;
std::mt19937 generator(rd());
std::uniform_real_distribution<float> distribution(-0.5, 0.5);

#pragma omp parallel for
for(int i = 0; i < 100000000; i++)
{
    float x = distribution(generator);
}

我在12个核心处理器上测试了Windows(Visual Studio 2010)和linux(Centos 6.5,gcc 4.9.1)上的代码,发现并行版本比串行代码慢。

结果如下:

g++ test.cpp -o test -std=c++11 -Ofast
time ./test
real    0m1.234s
user    0m1.229s
sys 0m0.004s

g++ test.cpp -o test -fopenmp -std=c++11 -Ofast
time ./test
real    0m1.708s
user    0m24.218s
sys 0m0.010s

为什么OpenMP比串行代码慢?

1 个答案:

答案 0 :(得分:2)

您在多个线程中使用一个随机数生成器。每次调用新的随机数都会阻止所有其他并行调用,直到完成。

如果您要对代码进行分析,则可能会在某种形式的互斥锁定/解锁中花费所有(或大部分)执行时间。这个问题被称为contention,你的场景将成为教科书中如何引起它的例子。

如果您使用std::thread并为每个线程分别设置rng,那么您将为该部分代码实现几乎100%的并行化。

一些代码可帮助您开始使用下面的std::thread。请注意使用std::ref

#include <array>
  using std::array;
#include <cstddef>
  using std::size_t;
#include <functional>
  using std::ref;
#include <iostream>
  using std::cout;
#include <iterator>
  using std::iterator_traits;
#include <thread>
  using std::thread;
#include <vector>
  using std::vector;
#include <random>
  using mersenne_twister = std::mt19937;

template<class T, T N>
array<T, N> series_of_numbers()
{
  array<T, N> arr;
  for(T i=0; i<N; ++i)
    arr[i] = i;

  return arr;
}

template<class Iterator, class Engine>
void generate_rng(Iterator begin, Iterator end, Engine& engine)
{
  std::uniform_real_distribution<double> dist;
  for(auto it = begin; it != end; ++it)
    *it = dist(engine);
}

int main()
{
  const size_t amount_of_random_numbers = 1024;
  // Engines
  const size_t Nrng = 4;
  auto seed_values = series_of_numbers<size_t, Nrng>(); // choose other seeds if you wish
  array<mersenne_twister, Nrng> engines;
  for(size_t i=0; i<Nrng; ++i)
    engines[i].seed(seed_values[i]);

  vector<thread> threads;
  vector<double> rngs(amount_of_random_numbers);

  // relevant iterators with offsets
  vector<vector<double>::iterator> begins = { rngs.begin(),
                                              rngs.begin() + amount_of_random_numbers/Nrng,
                                              rngs.begin() + 2*amount_of_random_numbers/Nrng,
                                              rngs.begin() + 3*amount_of_random_numbers/Nrng };

  vector<vector<double>::iterator> ends = { rngs.end(),
                                              rngs.end() - 3*amount_of_random_numbers/Nrng,
                                              rngs.end() - 2*amount_of_random_numbers/Nrng,
                                              rngs.end() - amount_of_random_numbers/Nrng };
  // create threads
  for(size_t n=0; n<Nrng; ++n)
    threads.emplace_back(thread(generate_rng<decltype(begins[n]), mersenne_twister>, ref(begins[n]), ref(ends[n]), ref(engines[n])));

  // join threads -> this is where the work will be done.
  for(size_t n=0; n<Nrng; ++n)
    threads[n].join();

  // rngs is filled with magical values!
  for(auto number : rngs)
    std::cout << number << '\n';
}

Live demo at Coliruanother version您可以将线程数实际更改为4的任意倍数