Question

我想使用OpenMP在for循环内并行化任务。但是，我不想使用#pragma omp parallel for，因为第（i + 1）次迭代的结果取决于第（i）次迭代的输出。我试图在代码中生成线程，但是每次创建和销毁它们的时间都很高。我的代码的抽象描述是：

int a_old=1;
int b_old=1;
int c_old=1;
int d_old=1;
for (int i=0; i<1000; i++)
{
   a_new = fun(a_old);  //fun() depends only on the value of the argument
   a_old = a_new;

   b_new = fun(b_old);
   b_old = b_new;

   c_new = fun(c_old);
   c_old = c_new;

   d_new = fun(d_old);
   d_old = d_new;
}

在每次迭代中，如何高效地使用线程并行计算a_new, b_new, c_new, d_new的新值？

Answer 1

这很简单，因为注释中提到的@kbr每个计算a，b，c和d是独立的，因此您可以将它们分开到不同的线程中，并将相应的值作为参数传递。示例代码如下所示。

#include<stdio.h>
#include <pthread.h>

void *thread_func(int *i)
{
    for (int j=0; j<1000; j++)
    {
        //Instead of increment u can call whichever function you want here.
        (*i)++;
    }
}

int main()
{
    int a_old=1;
    int b_old=1;
    int c_old=1;
    int d_old=1;
    pthread_t thread[4];

    pthread_create(&thread[0],0,thread_func,&a_old);
    pthread_create(&thread[1],0,thread_func,&b_old);
    pthread_create(&thread[2],0,thread_func,&c_old);
    pthread_create(&thread[3],0,thread_func,&d_old);

    pthread_join(&thread[0],NULL);
    pthread_join(&thread[1],NULL);
    pthread_join(&thread[2],NULL);
    pthread_join(&thread[3],NULL);

    printf("a_old %d",a_old);
    printf("b_old %d",b_old);
    printf("c_old %d",c_old);
    printf("d_old %d",d_old);

}

Answer 2

只需不要并行化for循环内的代码-将并行区域移到外部即可。这减少了线程创建和工作共享的开销。然后，您可以轻松应用OpenMP sections：

int a_old=1;
int b_old=1;
int c_old=1;
int d_old=1;
#pragma omp parallel sections
{
   #pragma omp section
   for (int i=0; i<1000; i++) {
       a_new = fun(a_old);  //fun() depends only on the value of the argument
       a_old = a_new;
   }
   #pragma omp section
   for (int i=0; i<1000; i++) {
      b_new = fun(b_old);
      b_old = b_new;
   }
   #pragma omp section
   for (int i=0; i<1000; i++) {
      c_new = fun(c_old);
      c_old = c_new;
   }
   #pragma omp section
   for (int i=0; i<1000; i++) {
       d_new = fun(d_old);
       d_old = d_new;
   }
}

还有另一个简化：

int value[4];
#pragma omp parallel for
for (int abcd = 0; abcd < 4; abcd++) {
    for (int i=0; i<1000; i++) {
        value[abcd] = fun(value[abcd]);
    }
}

在任何一种情况下，如果fun执行得很快，您可能要考虑在值之间添加填充，以防止错误共享。

for循环内的OpenMP Parallelize代码

2 个答案: