如何并行化这个结构?

时间:2016-04-06 09:14:58

标签: c++ multithreading parallel-processing openmp multicore

我正在尝试使用C ++中的OpenMP并行化以下结构:

x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;

for (i=0;i<N;++i) {
   for (j=0;j<M;++j) {
      x2 = j - a2;
      y2 = i - b2;

      func(x1,y1,x2,y2); // the function changes x1,y1 values

      x2 = x1;
      y2 = y1;

      func2(x1,y1,x2,y2); // the function changes x1,y1 values

      x1 += a1;
      y1 += b1;

      k1 = func3(x1,y1);

      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];

      k2 += 2;
   }
}
你可以帮助我吗?我非常感谢您提供的任何帮助。

修改

我尝试的最后一个解决方案是:

x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;

#pragma omp parallel for ordered schedule(dynamic,1) collapse(2)
for (i=0;i<N;++i) {
   for (j=0;j<M;++j) {
      x2 = j - a2;
      y2 = i - b2;

      func(x1,y1,x2,y2); // the function changes x1,y1 values

      x2 = x1;
      y2 = y1;

      func2(x1,y1,x2,y2); // the function changes x1,y1 values

#pragma omp critical
{
      x1 += a1;
      y1 += b1;
}

      k1 = func3(x1,y1);

#pragma omp ordered
{
      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];
}

#pragma omp atomic
      k2 += 2;
   }
}

导致分段错误。

1 个答案:

答案 0 :(得分:1)

我用@Guiroux阅读你的聊天,并确定你的循环是独立的。目前从迭代到迭代的唯一事情是k2,可以直接从ij计算(k2 = 2*j+2*M*i)。

因此,您的并行化代码可以只是

int k1, k2;
double x1,x2,y1,y2;
double a1,a2,b1,b2; //initialized to some values
double vec1[2*M*N],vec2[2*M*N]; // vec1 is read-only past this point

/* 
   NOTE: private(var) means that it may no longer have same value it had previously.
   In your case, these were all set to 0 before. After reading chat, it seems that
   they were never actually used as input anyway. As long as they are written to 
   before being read from (i.e. x1,y1 in func), no seg fault will occur.
*/
#pragma omp parallel for private(x2,y2,x1,y1,k1,k2)
for (int i=0; i<N; ++i) {      // I defined i,j here
   for (int j=0; j<M; ++j) {   // If you define them outside parallel for, 
                               // they must also be made private

      // variables set below are all private, so no threads will overwrite 
      // work that other threads have done
      x2 = j - a2;
      y2 = i - b2;

      func(x1,y1,x2,y2); // the function sets x1,y1 values

      x2 = x1;
      y2 = y1;

      func2(x1,y1,x2,y2); // the function sets x1,y1 values

      x1 += a1;
      y1 += b1;

      k1 = func3(x1,y1);

      // k2 is never the same for 2 different values of i, 
      // so different threads will never clobber each other here:
      k2 = 2*j+2*M*i;
      vec2[k2] = vec1[k1];
      vec2[k2+1] = vec1[k1+1];

   }
}

您不必折叠两个循环。只要N > nCoresOnYourComputer,您就可以看到如何在所有处理器之间平均分配工作。

希望一切都有道理。尝试并在我的代码中找出为什么我必须定义某些变量private(与默认的shared相比),以及为什么k2必须像我一样重新定义。

练习留给读者:你如何设置k2(也安全地避免竞争条件),但这主要保留了你最初使用的逻辑(即每次迭代k2 += 2)?