我在Windows应用程序(Visual Studio 2017)中使用concurrency::parallel_for()
来循环进行一些工作。一切正常,但我担心锁定性能,因此尝试了各种方法:std::mutex
,Windows CRITICAL_SECTION
等。
然后我尝试了concurrency::critical_section
。 documentation听起来好像应该更快,因为它知道并发运行时。
不。它不仅速度不快,而且在某些情况下非常危险。起初它只是炸毁我的应用程序。在调试器中,我可以看到并发只是创建了无限线程。当我将分区程序从默认分区更改为静态分区程序后,一切又恢复了,但一切都比使用Windows CRITICAL_SECTION
甚至什至std::mutex
我想知道是否有人可以向我解释以下任何一项
concurrency::critical_section
并使用默认分区程序使并发创建无限线程?static_partioner
使用parallel_for()
的速度比其他锁定机制要慢得多(即使我使用concurrency::critical_section
也能使用它)?concurrency::critical_section
的用例是什么?这是我的代码
#include <ppl.h>
void nonlinearReconstruction(const std::vector<Image>& window,
const Rect& rect,
Image& normals)
{
concurrency::critical_section mtx;
// This lambda uses the critical section "mtx" to control
// access to the shared image data in "normals". Read pixels,
// does math on them, and then sets other pixels.
const auto op =
[&normals, cols, rect, &window, &mtx] (int linearix)
{
// Determine what indices to use.
const auto r = static_cast<int>(linearix / cols);
const auto c = static_cast<int>(linearix % cols);
const auto r0 = r + rect.top();
const auto c0 = c + rect.left();
const auto c1 = std::max(c + rect.left() - 1, 0);
const auto r1 = r0;
const auto r2 = std::max(r + rect.top() - 1, 0);
const auto c2 = c + rect.left();
// Lock critical section to access shared memory pixels in "normals"
mtx.lock();
const auto ninit = normals.getpel(r0, c0).asArray();
const auto npx = normals.getpel(r1, c1).asArray();
const auto npy = normals.getpel(r2, c2).asArray();
mtx.unlock();
// Do heavy duty math on these pixels. I've left out the code but
// no locking of any kind is done. Just math on local data.
// ... blah blah blah
// Lock again to set the corrected pixel in shared memory
mtx.lock();
normals.setpel(
r + rect.top(),
c + rect.left(),
NormalVector(ntemp[0], ntemp[1], ntemp[2]));
// Unlock one final time.
mtx.unlock();
};
// Now call the parallel_for loop with the lambda above.
// This version causes infinite thread creation
concurrency::parallel_for(0, (int)totalix, op);
// This version works but performs much slower with the
// concurrency::critical_section than with std::mutex or
// Windows CRITICAL_SECTION
// concurrency::parallel_for(0, (int)totalix, op, concurrency::static_partitioner());
}
我检查过的几件事:
concurrency::critical_section
不是递归锁,但std :: mutex也不是,这也很好用)。