Question

我在Windows应用程序（Visual Studio 2017）中使用concurrency::parallel_for()来循环进行一些工作。一切正常，但我担心锁定性能，因此尝试了各种方法：std::mutex，Windows CRITICAL_SECTION等。

然后我尝试了concurrency::critical_section。 documentation听起来好像应该更快，因为它知道并发运行时。

不。它不仅速度不快，而且在某些情况下非常危险。起初它只是炸毁我的应用程序。在调试器中，我可以看到并发只是创建了无限线程。当我将分区程序从默认分区更改为静态分区程序后，一切又恢复了，但一切都比使用Windows CRITICAL_SECTION甚至什至std::mutex

都要慢得多。

我想知道是否有人可以向我解释以下任何一项

为什么我在默认分区器的lambda中使用concurrency::critical_section并使用默认分区程序使并发创建无限线程？
为什么static_partioner使用parallel_for()的速度比其他锁定机制要慢得多（即使我使用concurrency::critical_section也能使用它）？
concurrency::critical_section的用例是什么？

这是我的代码

#include <ppl.h>

void nonlinearReconstruction(const std::vector<Image>& window,
                             const Rect& rect,
                             Image& normals)
{
    concurrency::critical_section mtx;

    // This lambda uses the critical section "mtx" to control 
    // access to the shared image data in "normals".  Read pixels,
    // does math on them, and then sets other pixels.

    const auto op =
    [&normals, cols, rect, &window, &mtx] (int linearix)
    {
        // Determine what indices to use.
        const auto r = static_cast<int>(linearix / cols);
        const auto c = static_cast<int>(linearix % cols);
        const auto r0 = r + rect.top();
        const auto c0 = c + rect.left();
        const auto c1 = std::max(c + rect.left() - 1, 0);
        const auto r1 = r0; 
        const auto r2 = std::max(r + rect.top() - 1, 0);
        const auto c2 = c + rect.left();

        // Lock critical section to access shared memory pixels in "normals"

        mtx.lock();
        const auto ninit   = normals.getpel(r0, c0).asArray();
        const auto npx     = normals.getpel(r1, c1).asArray();
        const auto npy     = normals.getpel(r2, c2).asArray();
        mtx.unlock();  

        // Do heavy duty math on these pixels.  I've left out the code but 
        // no locking of any kind is done.  Just math on local data. 

        // ... blah blah blah


        // Lock again to set the corrected pixel in shared memory

        mtx.lock();
        normals.setpel(
            r + rect.top(), 
            c + rect.left(), 
            NormalVector(ntemp[0], ntemp[1], ntemp[2]));

        // Unlock one final time.

        mtx.unlock();
    };

    // Now call the parallel_for loop with the lambda above.
    // This version causes infinite thread creation
    concurrency::parallel_for(0, (int)totalix, op);

    // This version works but performs much slower with the 
    // concurrency::critical_section than with std::mutex or 
    // Windows CRITICAL_SECTION

    //  concurrency::parallel_for(0, (int)totalix, op, concurrency::static_partitioner());
}

我检查过的几件事：

我已验证该代码未引发任何异常。
此代码是不是递归的。（我知道concurrency::critical_section不是递归锁，但std :: mutex也不是，这也很好用）。
我什至已经完成了所有步骤，而且我确实控制了例程。

为什么我应该使用concurrency :: critical部分代替其他锁定机制

0 个答案: