我想使用C ++ 17并行功能将std::vector
的每个元素除以某个常量,并将结果存储在具有相同长度和(!!)顺序的另一个std::vector
中。
E.g。
{6,9,12} / 3 = {2,3,4}
我有一个没有编译的例子
#include <execution>
#include <algorithm>
template <typename T>
std::vector<T> & divide(std::vector<T> const & in)
{
std::vector<T> out(in.size(), 0);
float const divisor = 3;
std::for_each
( std::execution::par_unseq
, in.begin()
, in.end()
, /* divide each element by divisor and put result in out */ );
return out;
}
如何让这种运行,无锁和线程安全?
答案 0 :(得分:5)
类似的东西:
#include <vector>
#include <algorithm>
#include <execution>
template <typename T>
std::vector<T> divide(std::vector<T> result)
{
// ^^ take a copy of the argument here - will often be elided anyway
float const divisor = 3;
// the following loop mutates distinct objects within the vector and
// invalidates no iterators. c++ guarantees that each object is distinct
// and that neighbouring objects may be updated by different threads
// at the same time without a mutex.
std::for_each(
std::execution::par,
std::begin(result),
std::end(result),
[divisor](T& val) { // copies are safer, and the resulting code will be as quick.
// modifies value in place
val /= divisor;
});
// implicit fence here. Safe to manipulate the vector as a whole.
// from here on
// return by value. Allows RVO.
return result;
}
答案 1 :(得分:0)
为此您需要 std::transform
,而不是 std::for_each
。 Transofrm 接受输入和输出迭代器。
std::transform
的好处在于,如果需要,可以轻松分配到多个 CPU 内核。所以:
#include <execution>
#include <algorithm>
template <typename T>
std::vector<T> & divide(std::vector<T> const & in)
{
std::vector<T> out(in.size(), 0);
float const divisor = 3;
std::transform
( std::execution::par_unseq
in.begin(),
in.end(),
out.begin(),
out.end(),
[divisor](float val) {
// modifies value in place
return val / divisor;
});
return out;
}
旁注:如果您喜欢速度,请启用 -ffast-math
或乘以 (1 / divisor)