Question

收集输出容器时，哪种临界区样式更好？

// Insert into the output container one object at a time.
vector<float> output;
#pragma omp parallel for
for(int i=0; i<1000000; ++i)
{
    float value = // compute something complicated
    #pragma omp critical
    {
        output.push_back(value);
    }
}

// Insert object into per-thread container; later aggregate those containers.
vector<float> output;
#pragma omp parallel
{
    vector<float> per_thread;
    #pragma omp for
    for(int i=0; i<1000000; ++i)
    {
        float value = // compute something complicated
        per_thread.push_back(value);
    }
    #pragma omp critical
    {
        output.insert(output.end(), per_thread.begin(), per_thread.end());
    }
}

编辑：上面的示例具有误导性，因为它们表明每次迭代仅推送一个项目，在我的情况下是不正确的。这是更准确的示例：

// Insert into the output container one object at a time.
vector<float> output;
#pragma omp parallel for
for(int i=0; i<1000000; ++i)
{
    int k = // compute number of items
    for( int j=0; j<k; ++j)
    {
        float value = // compute something complicated
        #pragma omp critical
        {
            output.push_back(value);
        }
    }
}

// Insert object into per-thread container; later aggregate those containers.
vector<float> output;
#pragma omp parallel
{
    vector<float> per_thread;
    #pragma omp for
    for(int i=0; i<1000000; ++i)
    {
        int k = // compute number of items
        for( int j=0; j<k; ++j)
        {
            float value = // compute something complicated
            per_thread.push_back(value);
        }
    }
    #pragma omp critical
    {
        output.insert(output.end(), per_thread.begin(), per_thread.end());
    }
}

Answer 1

如果每次并行迭代中总是插入一个项目，则正确的方法是：

std::vector<float> output(1000000);
#pragma omp parallel for
for(int i=0; i<1000000; ++i)
{
    float value = // compute something complicated
    output[i] = value;
}

分配std::vector的不同元素是线程安全的（这是有保证的，因为所有i都是不同的）。在这种情况下，没有明显的虚假共享。

如果您没有在每次并行迭代中完全插入一个项目，则这两个版本基本上都是正确的。

您在循环中使用critical的第一个版本可能非常慢-请注意，如果计算确实很慢，那么总体上还是不错的。

每个线程的容器/手动减少通常很好。当然，它使结果的顺序不确定。您可以通过使用用户定义的简化来简化此操作。

openmp并行循环的输出容器

1 个答案: