我正在尝试使用OpenMP为我的程序添加并行性。
std::random_device rd;
std::mt19937 generator(rd());
std::uniform_real_distribution<float> distribution(-0.5, 0.5);
#pragma omp parallel for
for(int i = 0; i < 100000000; i++)
{
float x = distribution(generator);
}
我在12个核心处理器上测试了Windows(Visual Studio 2010)和linux(Centos 6.5,gcc 4.9.1)上的代码,发现并行版本比串行代码慢。
结果如下:
g++ test.cpp -o test -std=c++11 -Ofast
time ./test
real 0m1.234s
user 0m1.229s
sys 0m0.004s
g++ test.cpp -o test -fopenmp -std=c++11 -Ofast
time ./test
real 0m1.708s
user 0m24.218s
sys 0m0.010s
为什么OpenMP比串行代码慢?
答案 0 :(得分:2)
您在多个线程中使用一个随机数生成器。每次调用新的随机数都会阻止所有其他并行调用,直到完成。
如果您要对代码进行分析,则可能会在某种形式的互斥锁定/解锁中花费所有(或大部分)执行时间。这个问题被称为contention,你的场景将成为教科书中如何引起它的例子。
如果您使用std::thread
并为每个线程分别设置rng,那么您将为该部分代码实现几乎100%的并行化。
一些代码可帮助您开始使用下面的std::thread
。请注意使用std::ref
#include <array>
using std::array;
#include <cstddef>
using std::size_t;
#include <functional>
using std::ref;
#include <iostream>
using std::cout;
#include <iterator>
using std::iterator_traits;
#include <thread>
using std::thread;
#include <vector>
using std::vector;
#include <random>
using mersenne_twister = std::mt19937;
template<class T, T N>
array<T, N> series_of_numbers()
{
array<T, N> arr;
for(T i=0; i<N; ++i)
arr[i] = i;
return arr;
}
template<class Iterator, class Engine>
void generate_rng(Iterator begin, Iterator end, Engine& engine)
{
std::uniform_real_distribution<double> dist;
for(auto it = begin; it != end; ++it)
*it = dist(engine);
}
int main()
{
const size_t amount_of_random_numbers = 1024;
// Engines
const size_t Nrng = 4;
auto seed_values = series_of_numbers<size_t, Nrng>(); // choose other seeds if you wish
array<mersenne_twister, Nrng> engines;
for(size_t i=0; i<Nrng; ++i)
engines[i].seed(seed_values[i]);
vector<thread> threads;
vector<double> rngs(amount_of_random_numbers);
// relevant iterators with offsets
vector<vector<double>::iterator> begins = { rngs.begin(),
rngs.begin() + amount_of_random_numbers/Nrng,
rngs.begin() + 2*amount_of_random_numbers/Nrng,
rngs.begin() + 3*amount_of_random_numbers/Nrng };
vector<vector<double>::iterator> ends = { rngs.end(),
rngs.end() - 3*amount_of_random_numbers/Nrng,
rngs.end() - 2*amount_of_random_numbers/Nrng,
rngs.end() - amount_of_random_numbers/Nrng };
// create threads
for(size_t n=0; n<Nrng; ++n)
threads.emplace_back(thread(generate_rng<decltype(begins[n]), mersenne_twister>, ref(begins[n]), ref(ends[n]), ref(engines[n])));
// join threads -> this is where the work will be done.
for(size_t n=0; n<Nrng; ++n)
threads[n].join();
// rngs is filled with magical values!
for(auto number : rngs)
std::cout << number << '\n';
}
Live demo at Coliru。 another version您可以将线程数实际更改为4的任意倍数