Question

可能重复：
C++ 2011 : std::thread : simple example to parallelize a loop?

考虑以下程序在向量的元素上分配计算（我之前从未使用过std :: thread）：

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>

// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}

// Main
int main(int argc, char* argv[]) {

    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);

    // Initialization
    std::iota(v.begin(), v.end(), 0);

    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }

    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?

    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

代码中已经嵌入了两个问题。

第三个是：这个代码是完全正常还是可以使用std :: threads以更优雅的方式编写？我不知道使用std :: thread的“好习惯”......

Answer 1

关于第一个问题，如何计算每个线程的计算范围：我提取了常量并给它们命名，以使代码更容易阅读。对于良好实践，我还使用lambda使代码更容易修改 - lambda中的代码只能在这里使用，而函数f可以在整个程序中使用其他代码。利用这个来将代码的共享部分放在一个函数中，并且只在lambda中使用过一次。

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

高级用例会使用线程池，但这看起来如何取决于您的应用程序设计，并且不在STL中。有关线程模型的一个很好的示例，请参阅Qt Framework。如果您刚开始使用线程，请稍后保存。

评论中已经回答了第二个问题。 std::thread::join函数将等待（阻塞），直到线程完成。通过在每个线程上调用join函数并在join函数之后到达代码，可以确保所有线程都已完成，现在可以删除。

使用std :: thread和良好实践并行化循环

1 个答案: