Question

我一直在OpenMP中调用它

#pragma omp parallel for num_threads(totalThreads)
for(unsigned i=0; i<totalThreads; i++)
{
workOnTheseEdges(startIndex[i], endIndex[i]);
}

这在C ++ 11 std :: threads中（我相信那些只是pthreads）

vector<thread> threads;
for(unsigned i=0; i<totalThreads; i++)
{
threads.push_back(thread(workOnTheseEdges,startIndex[i], endIndex[i])); 
}
for (auto& thread : threads)
{
 thread.join();
}

但是，OpenMP实现的速度是速度的2倍 - 更快！我原本期望C ++ 11线程更快，因为它们更低级。注意：上面的代码不仅被调用一次，而且在一个循环中可能被调用10,000次，所以这可能与它有关吗？

编辑：为了澄清，在实践中，我要么使用OpenMP，要么使用C ++ 11版本 - 而不是两者。当我使用OpenMP代码时，需要45秒，当我使用C ++ 11时，需要100秒。

Answer 1

您的OpenMP版本中totalThreads来自哪里？我打赌它不是startIndex.size()。

OpenMP版本将请求排队到totalThreads工作线程。看起来C ++ 11版本创建了startIndex.size()个线程，如果这是一个很大的数字，它会涉及到大量的开销。

Answer 2

请考虑以下代码。 OpenMP版本在0秒内运行，而C ++ 11版本在50秒内运行。这不是因为函数是doNothing，而是因为vector在循环内。可以想象，创建了c ++ 11线程，然后在每次迭代中销毁它们。另一方面，OpenMP实际上实现了线程池。它不符合标准，但它适用于英特尔和AMD的实施。

for(int j=1; j<100000; ++j)
{
    if(algorithmToRun == 1)
    {
        vector<thread> threads;
        for(int i=0; i<16; i++)
        {
            threads.push_back(thread(doNothing));
        }
        for(auto& thread : threads) thread.join();
    }
    else if(algorithmToRun == 2)
    {
        #pragma omp parallel for num_threads(16)
        for(unsigned i=0; i<16; i++)
        {
            doNothing();
        }
    }
}

为什么OpenMP优于线程？

2 个答案: