我是std :: thread的新手,我尝试编写parallel_for
代码。
我编写了以下内容:
// parallel_for.cpp
// compilation: g++ -O3 -std=c++0x parallel_for.cpp -o parallel_for -lpthread
// execution: time ./parallel_for 100 50000000
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>
#include <utility>
// Parallel for
template<typename Iterator, class Function>
void parallel_for(const Iterator& first, const Iterator& last, Function&& f, const int nthreads = 1, const int threshold = 1000)
{
const unsigned int group = std::max(std::max(1, std::abs(threshold)), (last-first)/std::abs(nthreads));
std::vector<std::thread> threads;
for (Iterator it = first; it < last; it += group) {
threads.push_back(std::thread([=](){std::for_each(it, std::min(it+group, last), f);}));
}
std::for_each(threads.begin(), threads.end(), [=](std::thread& x){x.join();});
}
// Function to apply
template<typename Type>
void f1(Type& x)
{
x = std::sin(x)+std::exp(std::cos(x))/std::exp(std::sin(x));
}
// Main
int main(int argc, char* argv[]) {
const unsigned int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
const unsigned int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
double x = 0;
std::vector<double> v(n);
std::iota(v.begin(), v.end(), 0);
parallel_for(v.begin(), v.end(), f1<double>, nthreads);
for (unsigned int i = 0; i < n; ++i) x += v[i];
std::cout<<std::setprecision(15)<<x<<std::endl;
return 0;
}
但这不起作用:(来自g ++ 4.6的错误代码)
parallel_for.cpp: In instantiation of ‘parallel_for(const Iterator&, const Iterator&, Function&&, int, int) [with Iterator = __gnu_cxx::__normal_iterator<double*, std::vector<double> >, Function = void (&)(double&)]::<lambda()>’:
parallel_for.cpp:22:9: instantiated from ‘void parallel_for(const Iterator&, const Iterator&, Function&&, int, int) [with Iterator = __gnu_cxx::__normal_iterator<double*, std::vector<double> >, Function = void (&)(double&)]’
parallel_for.cpp:43:58: instantiated from here
parallel_for.cpp:22:89: erreur: field ‘parallel_for(const Iterator&, const Iterator&, Function&&, int, int) [with Iterator = __gnu_cxx::__normal_iterator<double*, std::vector<double> >, Function = void (&)(double&)]::<lambda()>::__f’ invalidly declared function type
如何解决这个问题?
编辑:这个新版本编译但没有给出好结果:
// parallel_for.cpp
// compilation: g++ -O3 -std=c++0x parallel_for.cpp -o parallel_for -lpthread
// execution: time ./parallel_for 100 50000000
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdlib>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>
#include <utility>
// Parallel for
template<typename Iterator, class Function>
void parallel_for(const Iterator& first, const Iterator& last, Function&& f, const int nthreads = 1, const int threshold = 1000)
{
const unsigned int group = std::max(std::max(1, std::abs(threshold)), (last-first)/std::abs(nthreads));
std::vector<std::thread> threads;
for (Iterator it = first; it < last; it += group) {
threads.push_back(std::thread([=, &f](){std::for_each(it, std::min(it+group, last), f);}));
}
std::for_each(threads.begin(), threads.end(), [](std::thread& x){x.join();});
}
// Function to apply
template<typename Type>
void f(Type& x)
{
x = std::sin(x)+std::exp(std::cos(x))/std::exp(std::sin(x));
}
// Main
int main(int argc, char* argv[]) {
const unsigned int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
const unsigned int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
double x = 0;
double y = 0;
std::vector<double> v(n);
std::iota(v.begin(), v.end(), 0);
std::for_each(v.begin(), v.end(), f<double>);
for (unsigned int i = 0; i < n; ++i) x += v[i];
std::iota(v.begin(), v.end(), 0);
parallel_for(v.begin(), v.end(), f<double>, nthreads);
for (unsigned int i = 0; i < n; ++i) y += v[i];
std::cout<<std::setprecision(15)<<x<<" "<<y<<std::endl;
return 0;
}
结果是:
./parallel_for 1 100
155.524339894552 4950
并行版本返回4950而顺序版本返回155 ..... 问题在哪里?
答案 0 :(得分:5)
您需要在(last-first)进行强制转换或类型转换。原因是在模板参数推断期间永远不会进行类型转换。
这很好用(也解决了DeadMG和Ben Voigt发现的问题)。 两个版本都给出156608294.151782,其中n = 100000000。
template<typename Iterator, class Function>
void parallel_for(const Iterator& first, const Iterator& last, Function&& f, const int nthreads = 1, const int threshold = 1000)
{
const unsigned int group = std::max(std::max(ptrdiff_t(1), ptrdiff_t(std::abs(threshold))), ((last-first))/std::abs(nthreads));
std::vector<std::thread> threads;
threads.reserve(nthreads);
Iterator it = first;
for (; it < last-group; it += group) {
threads.push_back(std::thread([=,&f](){std::for_each(it, std::min(it+group, last), f);}));
}
std::for_each(it, last, f); // last steps while we wait for other threads
std::for_each(threads.begin(), threads.end(), [](std::thread& x){x.join();});
}
由于步骤for_each(it, last, f)
小于其他步骤,我们也可以使用调用线程在等待其他结果时完成该步骤。
答案 1 :(得分:1)
您必须通过引用捕获函数。
[=, &f] () { /* your code */ };
查看代码。
#include <iostream>
template <class T>
void foo(const T& t)
{
const int a = t;
[&]
{
std::cout << a << std::endl;
}();
}
int main()
{
foo(42);
return 0;
}
clang提供输出42
,但g ++会发出警告:‘a’ is used uninitialized in this function
,并打印0
。看起来像个bug。
解决方法:使用const auto
(代码中的变量group
)。
UPD :我想,就是这样。 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52026
答案 2 :(得分:1)
一个问题是it += group
可以合法地last
,但是在最后创建一个值是未定义的行为。仅仅检查it < last
为时已晚,无法解决问题。
当last - it
仍然有效时,您需要测试it
。 (it + group
和last - group
都不一定是安全的,但后者应该是group
的计算方式。)
例如:
template<typename Iterator, class Function>
void parallel_for(const Iterator& first, const Iterator& last, Function f, const int nthreads = 1, const int threshold = 100)
{
const unsigned int group = std::max(std::max(1, std::abs(threshold)), (last-first)/std::abs(nthreads));
std::vector<std::thread> threads;
threads.reserve(nthreads);
Iterator it = first;
for (; last - it > group; it += group) {
threads.push_back(std::thread([=, &f](){std::for_each(it, it+group, last), f);}));
}
threads.push_back(std::thread([=, &f](){std::for_each(it, last, f);}));
std::for_each(threads.begin(), threads.end(), [](std::thread& x){x.join();});
}
答案 3 :(得分:0)
您将std::min(it+group, last)
提供给std::for_each
,但始终将group
添加到最后。这意味着如果last
不是来自group
的{{1}}的倍数,您将it
移过it
,即UB。
答案 4 :(得分:0)
您需要通过引用捕获,并且需要在(last-first)进行强制转换或类型转换。 原因是在模板参数推断期间永远不会进行类型转换。
另外,修复DeadMG发现的问题,最后得到以下代码。
它工作得很好,两个版本都给出156608294.151782,其中n = 100000000。
template<typename Iterator, class Function>
void parallel_for(const Iterator& first, const Iterator& last, Function&& f, const int nthreads = 1, const int threshold = 1000)
{
const unsigned int group = std::max(std::max(ptrdiff_t(1), ptrdiff_t(std::abs(threshold))), ((last-first))/std::abs(nthreads));
std::vector<std::thread> threads;
Iterator it = first;
for (; it < last-group; it += group) {
threads.push_back(std::thread([=,&f](){std::for_each(it, std::min(it+group, last), f);}));
}
std::for_each(it, last, f); // use calling thread while we wait for the others
std::for_each(threads.begin(), threads.end(), [](std::thread& x){x.join();});
}
答案 5 :(得分:0)
vc11解决方案,如果不能使用gcc,请告诉我。
template<typename Iterator, class Function>
void parallel_for( const Iterator& first, const Iterator& last, Function&& f, const size_t nthreads = std::thread::hardware_concurrency(), const size_t threshold = 1 )
{
const size_t portion = std::max( threshold, (last-first) / nthreads );
std::vector<std::thread> threads;
for ( Iterator it = first; it < last; it += portion )
{
Iterator begin = it;
Iterator end = it + portion;
if ( end > last )
end = last;
threads.push_back( std::thread( [=,&f]() {
for ( Iterator i = begin; i != end; ++i )
f(i);
}));
}
std::for_each(threads.begin(), threads.end(), [](std::thread& x){x.join();});
}