我在OpenMP(c语言)中遇到嵌套循环的一些问题。这就是我想要实现的(简要):
int a=0, b=1000;
while(a<b) // Not parallel, must run only once.
{
for (int i = 0; i < N; ++i) // Must be parallel
{...}
for (int j = 0; j < N; ++j) // Must be parallel
{...}
a++;
}
所以我试着这样想。但while循环执行4次:
int a=0, b=1000;
omp_set_num_threads(4);
#pragma parallel omp shared(default)
{
while(a<b) // parallel, but run 4 times /!\
{ // add 'single' doesn't compile. (nested regions)
#pragma omp for
for (int i = 0; i < N; ++i) // works well
{...}
#pragma omp for
for (int j = 0; j < N; ++j) // works well
{...}
a++;
}
}
所以在浏览网页后我发现了这个实现:
int a=0, b=1000;
omp_set_num_threads(4);
#pragma parallel omp shared(default) num_thread(1)
{
while(a<b) // parallel and run once
{
#pragma parallel omp for num_thread(4)
for (int i = 0; i < N; ++i) // works well
{...}
#pragma parallel omp for num_thread(4)
for (int j = 0; j < N; ++j) // works well
{...}
a++;
}
}
然而,最后这个很慢。它比没有OpenMP需要多5到10倍的时间。所以我想知道是否可以避免在while循环的每一步声明并行区域?我已经查看了"single"
或"master"
,但这些区域内不允许#pragma
个区域。
答案 0 :(得分:1)
正确的方法如下:
int b=1000;
#pragma omp parallel
{
for (int a = 0; a < b; a++)
{
#pragma omp for
for (int i = 0; i < N; ++i) // works well
{...}
#pragma omp for
for (int j = 0; j < N; ++j) // works well
{...}
}
}
这相当于:
int b=1000;
#pragma omp parallel
{
int a=0; // !!! must be private
while(a<b)
{
#pragma omp for
for (int i = 0; i < N; ++i) // works well
{...}
#pragma omp for
for (int j = 0; j < N; ++j) // works well
{...}
a++;
}
}
但后者更难以阅读/推理。隐藏的代码(...
)在任何情况下都不得修改a
。
是的,为每个线程执行外部while循环,但这是正确的。一旦线程到达omp for
工作共享构造,它们将正确执行内部循环。