Question

请参阅下面的编辑，了解我的初步解决方案

请考虑以下代码：

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(void) {

int counter = 0;
int i;

omp_set_num_threads(8); 

#pragma omp parallel
        { 
            int id = omp_get_thread_num();
            #pragma omp for private(i)
            for (i = 0; i<10; i++) {
                printf("id: %d thread: %d\n", i, id);
                #pragma omp critical // or atomic
                counter++;
            }
        }

printf("counter %d\n", counter);

return 0;
}

我将线程数定义为8.对于8个线程中的每个线程，我希望每个线程都有一个for循环，以增加变量counter。但是，似乎OpenMP并行化了for循环：

i: 0 thread: 0
i: 1 thread: 0
i: 4 thread: 2
i: 6 thread: 4
i: 2 thread: 1
i: 3 thread: 1
i: 7 thread: 5
i: 8 thread: 6
i: 5 thread: 3
i: 9 thread: 7
counter 10

因此，counter=10，但我想要counter=80。我可以做什么，以便每个线程在所有线程递增for时执行自己的counter循环？

以下代码提供了所需的结果： 我添加了另一个外部for循环，它从0循环到最大线程数。在这个循环中，我可以为每个线程声明我的for循环私有。的确，在这种情况下counter=80。 这是解决此问题的最佳解决方案还是有更好的解决方案？

int main(void) {


omp_set_num_threads(8); 

int mthreads = omp_get_max_threads();

#pragma omp parallel for private(i)
    for (n=0; n<mthreads; n++) {
            int id = omp_get_thread_num();
        for (i = 0; i<10; i++) {
            printf("i: %d thread: %d\n", i, id);
            #pragma omp critical
            counter++;
        }
    }

}
printf("counter %d\n", counter);

return 0;
}

Answer 1

解决方案非常简单 - 删除工作共享构造for：

#pragma omp parallel
    { 
        int id = omp_get_thread_num();
        for (int i = 0; i<10; i++) {
            printf("id: %d thread: %d\n", i, id);
            #pragma omp critical // or atomic
            counter++;
        }
    }

在i的控件部分内声明for是C99的一部分，可能要求您向编译器传递类似于-std=c99的选项。否则，您只需在块的开头声明i即可。或者您可以在区域外声明它并将其设为private：

int i;

#pragma omp parallel private(i)
    { 
        int id = omp_get_thread_num();
        for (i = 0; i<10; i++) {
            printf("id: %d thread: %d\n", i, id);
            #pragma omp critical // or atomic
            counter++;
        }
    }

由于您没有在并行区域内使用counter的值，因此您也可以使用减少总和：

#pragma omp parallel reduction(+:counter)
    { 
        int id = omp_get_thread_num();
        for (int i = 0; i<10; i++) {
            printf("id: %d thread: %d\n", i, id);
            counter++;
        }
    }

Answer 2

OpenMp有一个概念，reduction。坚持你的榜样

#pragma omp parallel for reduction(+:counter)
  for (unsigned n=0; n<mthreads; n++) {
    int id = omp_get_thread_num();
    for (unsigned i = 0; i<10; i++) {
      printf("i: %d thread: %d\n", i, id);
      counter++;
    }
  }

这有利于不围绕增量定义关键部分。 OpenMp自己收集counter所有不同化身的总和，并且可能更有效。

这甚至可以简单地表达为

#pragma omp parallel for reduction(+:counter)
  for (unsigned i=0; i<mthreads*10; i++) {
    int id = omp_get_thread_num();
    printf("i: %d thread: %d\n", i, id);
    counter++;
  }

对于某些编译器，您可能仍然必须坚持使用-std=c99等标志来声明for循环中的变量。将变量声明为尽可能本地的优点是，您不必坚持认为它们是私有的或类似的东西。最简单的方法当然是让OpenMp自行完成for循环的分割。

OpenMP中每个线程的私有'for'循环

2 个答案: