已解决:请参阅下面的编辑2
我正在尝试并行化一种算法,该算法对矩阵进行一些操作(为简单起见,我们称之为模糊)。一旦完成此操作,它就会发现旧矩阵和新矩阵之间的最大变化(每个元素基础上新旧矩阵之间的绝对差值的最大值)。如果此最大差异高于某个阈值,则再进行矩阵运算的迭代。
所以我的主程序有以下循环:
converged = 0;
for( i = 1; i <= iteration_limit; i++ ){
max_diff = update( &data_grid );
if( max_diff < tol ) {
converged = 1;
break;
}
}
update( &data_grid )
然后调用模糊算法的实际实现。模糊算法然后遍历矩阵,这是我试图并行化的循环:
for( i = 0; i < width; i++ ) {
for( j = 0; j <= height; j++ ) {
g->data[ update ][ i ][ j ] =
ONE_QUARTER * (
g->data[ update ][ i + 1 ][ j ] +
g->data[ update ][ i - 1 ][ j ] +
g->data[ update ][ i ][ j + 1 ] +
g->data[ update ][ i ][ j - 1 ] +
);
diff = fabs( g->data[ old ][ i ][ j ] - g->data[ update ][ i ][ j ] );
maxdiff = maxdiff > diff ? maxdiff : diff;
}
}
我可以在update(&data_grid)
内部放置一个并行区域,但这意味着在我试图避免的每次迭代中都会创建和销毁线程。:
#pragma omp parallel for private(i, j, diff, maxdg) shared(width, height, update, g, dg, chunksize) default(none) schedule(static, chunksize)
我有两个网格副本,并通过在old
和update
之间切换0
和1
,在每次迭代中在“另一个”中写下新答案。
修改
所以我根据Jonathan Dursi的建议制作了一个孤立的omp for loop,但由于某种原因,似乎找不到线程之间的最大值......
这是我的“外部”代码:
converged = 0;
#pragma omp parallel shared(i, max_iter, g, tol, maxdg, dg) private(converged) default(none)
{
for( i = 1; i <= 40; i++ ){
maxdg = 0;
dg = grid_update( &g );
printf("[%d] dg from a single thread: %f\n", omp_get_thread_num(), dg );
#pragma omp critical
{
if (dg > maxdg) maxdg = dg;
}
#pragma omp barrier
#pragma omp flush
printf("[%d] maxdg: %f\n", omp_get_thread_num(), maxdg);
if( maxdg < tol ) {
converged = 1;
break;
}
}
}
结果:
[11] dg from a single thread: 0.000000
[3] dg from a single thread: 0.000000
[4] dg from a single thread: 0.000000
[5] dg from a single thread: 0.000000
[0] dg from a single thread: 0.166667
[6] dg from a single thread: 0.000000
[7] dg from a single thread: 0.000000
[8] dg from a single thread: 0.000000
[9] dg from a single thread: 0.000000
[15] dg from a single thread: 0.000000
[10] dg from a single thread: 0.000000
[1] dg from a single thread: 0.166667
[12] dg from a single thread: 0.000000
[13] dg from a single thread: 0.000000
[14] dg from a single thread: 0.000000
[2] maxdg: 0.000000
[3] maxdg: 0.000000
[0] maxdg: 0.000000
[8] maxdg: 0.000000
[9] maxdg: 0.000000
[4] maxdg: 0.000000
[5] maxdg: 0.000000
[6] maxdg: 0.000000
[7] maxdg: 0.000000
[1] maxdg: 0.000000
[14] maxdg: 0.000000
[11] maxdg: 0.000000
[15] maxdg: 0.000000
[10] maxdg: 0.000000
[12] maxdg: 0.000000
[13] maxdg: 0.000000
编辑2: 私人/共享分类器犯了一些错误,忘记了障碍。这是正确的代码:
#pragma omp parallel shared(max_iter, g, tol, maxdg) private(i, dg, converged) default(none)
{
for( i = 1; i <= max_iter; i++ ){
#pragma omp barrier
maxdg=0;
/*#pragma omp flush */
dg = grid_update( &g );
#pragma omp critical
{
if (dg > maxdg) maxdg = dg;
}
#pragma omp barrier
/*#pragma omp flush*/
if( maxdg < tol ) {
converged = 1;
break;
}
}
}
答案 0 :(得分:1)
并行部分在for之前的另一个例程中开始没有问题,当然是自OpenMP 3.0(2008)以来,也许是自OpenMP 2.5以来。用gcc4.4:
outer.c:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void update(int n, int iter);
int main(int argc, char **argv) {
int n=10;
#pragma omp parallel num_threads(4) default(none) shared(n)
for (int iter=0; iter<3; iter++)
{
#pragma omp single
printf("---iteration %d---\n", iter);
update(n, iter);
}
return 0;
}
inner.c:
#include <omp.h>
#include <stdio.h>
void update(int n, int iter) {
int thread = omp_get_thread_num();
#pragma omp for
for (int i=0;i<n;i++) {
int newthread=omp_get_thread_num();
printf("%3d: doing loop index %d.\n",newthread,i);
}
}
大厦:
$ make
gcc44 -g -fopenmp -std=c99 -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99 -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
$ ./main
---iteration 0---
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
---iteration 1---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
---iteration 2---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
但是根据@ jdv-Jan de Vaan的说法,如果在最新的OpenMP实现中,这会导致比更新并行更高的性能,特别是如果更新足够昂贵,我会非常惊讶。
顺便说一句,在更新中,在Gauss-Seidel例程中只需将i-loop放在并行周围就会出现问题;你可以看到i步骤不是独立的,这将导致竞争条件。您将需要执行类似Red-Black或Jacobi迭代的操作......<强>更新强>
提供的代码示例是用于G-S迭代,而不是Jacobi,但我只是假设这是一个错字。
如果您的问题实际上是关于reduce而不是孤立的for循环:是的,您可能不得不在OpenMP中滚动自己的最小/最大缩减,但它非常简单,您只需使用常用的技巧。
更新2 - yikes,locmax需要是私有的,不能共享。
outer.c:
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
int update(int n, int iter);
int main(int argc, char **argv) {
int n=10;
int max, locmax;
max = -999;
#pragma omp parallel num_threads(4) default(none) shared(n, max) private(locmax)
for (int iter=0; iter<3; iter++)
{
#pragma omp single
printf("---iteration %d---\n", iter);
locmax = update(n, iter);
#pragma omp critical
{
if (locmax > max) max=locmax;
}
#pragma omp barrier
#pragma omp flush
#pragma omp single
printf("---iteration %d's max value = %d---\n", iter, max);
}
return 0;
}
inner.c:
#include <omp.h>
#include <stdio.h>
int update(int n, int iter) {
int thread = omp_get_thread_num();
int max = -999;
#pragma omp for
for (int i=0;i<n;i++) {
printf("%3d: doing loop index %d.\n",thread,i);
if (i+iter>max) max = i+iter;
}
return max;
}
并建立:
$ make
gcc44 -g -fopenmp -std=c99 -c -o inner.o inner.c
gcc44 -g -fopenmp -std=c99 -c -o outer.o outer.c
gcc44 -o main outer.o inner.o -fopenmp -lgomp
bash-3.2$ ./main
---iteration 0---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
---iteration 0's max value = 9---
---iteration 1---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
---iteration 1's max value = 10---
---iteration 2---
0: doing loop index 0.
0: doing loop index 1.
0: doing loop index 2.
1: doing loop index 3.
1: doing loop index 4.
1: doing loop index 5.
3: doing loop index 9.
2: doing loop index 6.
2: doing loop index 7.
2: doing loop index 8.
---iteration 2's max value = 11---