OpenMP嵌套循环高性能

时间:2017-12-16 22:09:13

标签: openmp

我在OpenMP(c语言)中遇到嵌套循环的一些问题。这就是我想要实现的(简要):

int a=0, b=1000;
while(a<b) // Not parallel, must run only once.
{
    for (int i = 0; i < N; ++i) // Must be parallel
    {...}

    for (int j = 0; j < N; ++j) // Must be parallel
    {...}
    a++;
}

所以我试着这样想。但while循环执行4次:

    int a=0, b=1000;
omp_set_num_threads(4);
#pragma parallel omp shared(default)
{
    while(a<b) // parallel, but run 4 times /!\
    {          // add 'single' doesn't compile. (nested regions)
        #pragma omp for
        for (int i = 0; i < N; ++i) // works well
        {...}

        #pragma omp for
        for (int j = 0; j < N; ++j) // works well
        {...}
        a++;
    }
}

所以在浏览网页后我发现了这个实现:

    int a=0, b=1000;
omp_set_num_threads(4);
#pragma parallel omp shared(default) num_thread(1)
{
    while(a<b) // parallel and run once
    {
        #pragma parallel omp for num_thread(4)
        for (int i = 0; i < N; ++i) // works well
        {...}

        #pragma parallel omp for num_thread(4)
        for (int j = 0; j < N; ++j) // works well
        {...}
        a++;
    }
}

然而,最后这个很慢。它比没有OpenMP需要多5到10倍的时间。所以我想知道是否可以避免在while循环的每一步声明并行区域?我已经查看了"single""master",但这些区域内不允许#pragma个区域。

1 个答案:

答案 0 :(得分:1)

正确的方法如下:

int b=1000;
#pragma omp parallel
{
    for (int a = 0; a < b; a++)
    {
        #pragma omp for
        for (int i = 0; i < N; ++i) // works well
        {...}

        #pragma omp for
        for (int j = 0; j < N; ++j) // works well
        {...}
    }
}

这相当于:

int b=1000;
#pragma omp parallel
{
    int a=0; // !!! must be private
    while(a<b)
    {
        #pragma omp for
        for (int i = 0; i < N; ++i) // works well
        {...}

        #pragma omp for
        for (int j = 0; j < N; ++j) // works well
        {...}
        a++;
    }
}

但后者更难以阅读/推理。隐藏的代码(...在任何情况下都不得修改a

是的,为每个线程执行外部while循环,但这是正确的。一旦线程到达omp for工作共享构造,它们将正确执行内部循环。