如何在OpenMP中使用孤立的for循环?

时间:2011-05-19 19:16:30

标签: c openmp parallel-processing

已解决:请参阅下面的编辑2

我正在尝试并行化一种算法,该算法对矩阵进行一些操作(为简单起见,我们称之为模糊)。一旦完成此操作,它就会发现旧矩阵和新矩阵之间的最大变化(每个元素基础上新旧矩阵之间的绝对差值的最大值)。如果此最大差异高于某个阈值,则再进行矩阵运算的迭代。

所以我的主程序有以下循环:

converged = 0;
for( i = 1; i <= iteration_limit; i++ ){
    max_diff = update( &data_grid );

    if( max_diff < tol ) {
        converged = 1;
        break;
    }
}

update( &data_grid )然后调用模糊算法的实际实现。模糊算法然后遍历矩阵,这是我试图并行化的循环:

for( i = 0; i < width; i++ ) {
    for( j = 0; j <= height; j++ ) {
        g->data[ update ][ i ][ j ] = 
        ONE_QUARTER * ( 
                     g->data[ update ][ i + 1 ][ j     ] +
                     g->data[ update ][ i - 1 ][ j     ] +
                     g->data[ update ][ i     ][ j + 1 ] +
                     g->data[ update ][ i     ][ j - 1 ] +
                     );
        diff = fabs( g->data[ old ][ i ][ j ] - g->data[ update ][ i ][ j ] );
        maxdiff = maxdiff > diff ? maxdiff : diff;
    }
}

我可以在update(&data_grid)内部放置一个并行区域,但这意味着在我试图避免的每次迭代中都会创建和销毁线程。:

#pragma omp parallel for private(i, j, diff, maxdg) shared(width, height, update, g, dg, chunksize) default(none) schedule(static, chunksize)

我有两个网格副本,并通过在oldupdate之间切换01,在每次迭代中在“另一个”中写下新答案。

修改

所以我根据Jonathan Dursi的建议制作了一个孤立的omp for loop,但由于某种原因,似乎找不到线程之间的最大值......

这是我的“外部”代码:

  converged = 0;

  #pragma omp parallel shared(i, max_iter, g, tol, maxdg, dg) private(converged) default(none)
  {
      for( i = 1; i <= 40; i++ ){

          maxdg = 0;

          dg = grid_update( &g );

          printf("[%d] dg from a single thread: %f\n", omp_get_thread_num(), dg );


  #pragma omp critical
          {
              if (dg > maxdg) maxdg = dg;
          }

  #pragma omp barrier
  #pragma omp flush

          printf("[%d] maxdg: %f\n", omp_get_thread_num(), maxdg);

          if( maxdg < tol ) {
              converged = 1;
              break;
          }
      }
  }

结果:

  [11] dg from a single thread: 0.000000
  [3] dg from a single thread: 0.000000
  [4] dg from a single thread: 0.000000
  [5] dg from a single thread: 0.000000
  [0] dg from a single thread: 0.166667
  [6] dg from a single thread: 0.000000
  [7] dg from a single thread: 0.000000
  [8] dg from a single thread: 0.000000
  [9] dg from a single thread: 0.000000
  [15] dg from a single thread: 0.000000
  [10] dg from a single thread: 0.000000
  [1] dg from a single thread: 0.166667
  [12] dg from a single thread: 0.000000
  [13] dg from a single thread: 0.000000
  [14] dg from a single thread: 0.000000
  [2] maxdg: 0.000000
  [3] maxdg: 0.000000
  [0] maxdg: 0.000000
  [8] maxdg: 0.000000
  [9] maxdg: 0.000000
  [4] maxdg: 0.000000
  [5] maxdg: 0.000000
  [6] maxdg: 0.000000
  [7] maxdg: 0.000000
  [1] maxdg: 0.000000
  [14] maxdg: 0.000000
  [11] maxdg: 0.000000
  [15] maxdg: 0.000000
  [10] maxdg: 0.000000
  [12] maxdg: 0.000000
  [13] maxdg: 0.000000

编辑2: 私人/共享分类器犯了一些错误,忘记了障碍。这是正确的代码:

  #pragma omp parallel shared(max_iter, g, tol, maxdg) private(i, dg, converged) default(none)
  {
      for( i = 1; i <= max_iter; i++ ){

  #pragma omp barrier
          maxdg=0;
  /*#pragma omp flush */

          dg = grid_update( &g );

  #pragma omp critical
          {
              if (dg > maxdg) maxdg = dg;
          }

  #pragma omp barrier
  /*#pragma omp flush*/

          if( maxdg < tol ) {
              converged = 1;
              break;
          }
      }
  }

1 个答案:

答案 0 :(得分:1)

并行部分在for之前的另一个例程中开始没有问题,当然是自OpenMP 3.0(2008)以来,也许是自OpenMP 2.5以来。用gcc4.4:

outer.c:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

void update(int n, int iter);

int main(int argc, char **argv) {
    int n=10;

    #pragma omp parallel num_threads(4) default(none) shared(n)
    for (int iter=0; iter<3; iter++)
    {
        #pragma omp single
        printf("---iteration %d---\n", iter);
        update(n, iter);
    }

    return 0;
}

inner.c:

#include <omp.h>
#include <stdio.h>

void update(int n, int iter) {
    int thread = omp_get_thread_num();

    #pragma omp for
    for  (int i=0;i<n;i++) {
        int newthread=omp_get_thread_num();
        printf("%3d: doing loop index %d.\n",newthread,i);
    }
}

大厦:

$ make
gcc44 -g -fopenmp -std=c99   -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99   -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
$ ./main 
---iteration 0---
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
---iteration 1---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
---iteration 2---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.

但是根据@ jdv-Jan de Vaan的说法,如果在最新的OpenMP实现中,这会导致比更新并行更高的性能,特别是如果更新足够昂贵,我会非常惊讶。

顺便说一句,在更新中,在Gauss-Seidel例程中只需将i-loop放在并行周围就会出现问题;你可以看到i步骤不是独立的,这将导致竞争条件。您将需要执行类似Red-Black或Jacobi迭代的操作......

<强>更新

提供的代码示例是用于G-S迭代,而不是Jacobi,但我只是假设这是一个错字。

如果您的问题实际上是关于reduce而不是孤立的for循环:是的,您可能不得不在OpenMP中滚动自己的最小/最大缩减,但它非常简单,您只需使用常用的技巧。

更新2 - yikes,locmax需要是私有的,不能共享。

outer.c:

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int update(int n, int iter);

int main(int argc, char **argv) {
    int n=10;
    int max, locmax;

    max = -999;

    #pragma omp parallel num_threads(4) default(none) shared(n, max) private(locmax)
    for (int iter=0; iter<3; iter++)
    {
        #pragma omp single
            printf("---iteration %d---\n", iter);

        locmax = update(n, iter);

        #pragma omp critical
        {
            if (locmax > max) max=locmax;
        }

        #pragma omp barrier
        #pragma omp flush

        #pragma omp single
            printf("---iteration %d's max value = %d---\n", iter, max);
    }
    return 0;
}

inner.c:

#include <omp.h>
#include <stdio.h>

int update(int n, int iter) {
    int thread = omp_get_thread_num();
    int max = -999;

    #pragma omp for
    for  (int i=0;i<n;i++) {
        printf("%3d: doing loop index %d.\n",thread,i);
        if (i+iter>max) max = i+iter;
    }

    return max;
}

并建立:

$ make
gcc44 -g -fopenmp -std=c99   -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99   -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
bash-3.2$ ./main 
---iteration 0---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
---iteration 0's max value = 9---
---iteration 1---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
---iteration 1's max value = 10---
---iteration 2---
  0: doing loop index 0.
  0: doing loop index 1.
  0: doing loop index 2.
  1: doing loop index 3.
  1: doing loop index 4.
  1: doing loop index 5.
  3: doing loop index 9.
  2: doing loop index 6.
  2: doing loop index 7.
  2: doing loop index 8.
---iteration 2's max value = 11---