我正在尝试使用C ++中的OpenMP并行化以下结构:
x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;
for (i=0;i<N;++i) {
for (j=0;j<M;++j) {
x2 = j - a2;
y2 = i - b2;
func(x1,y1,x2,y2); // the function changes x1,y1 values
x2 = x1;
y2 = y1;
func2(x1,y1,x2,y2); // the function changes x1,y1 values
x1 += a1;
y1 += b1;
k1 = func3(x1,y1);
vec2[k2] = vec1[k1];
vec2[k2+1] = vec1[k1+1];
k2 += 2;
}
}
你可以帮助我吗?我非常感谢您提供的任何帮助。
修改
我尝试的最后一个解决方案是:
x1,x2,y1,y2,k1,k2 = 0;
a1,a2,b1,b2; //initialized to some value
vec1,vec2;
#pragma omp parallel for ordered schedule(dynamic,1) collapse(2)
for (i=0;i<N;++i) {
for (j=0;j<M;++j) {
x2 = j - a2;
y2 = i - b2;
func(x1,y1,x2,y2); // the function changes x1,y1 values
x2 = x1;
y2 = y1;
func2(x1,y1,x2,y2); // the function changes x1,y1 values
#pragma omp critical
{
x1 += a1;
y1 += b1;
}
k1 = func3(x1,y1);
#pragma omp ordered
{
vec2[k2] = vec1[k1];
vec2[k2+1] = vec1[k1+1];
}
#pragma omp atomic
k2 += 2;
}
}
导致分段错误。
答案 0 :(得分:1)
我用@Guiroux阅读你的聊天,并确定你的循环是独立的。目前从迭代到迭代的唯一事情是k2
,可以直接从i
和j
计算(k2 = 2*j+2*M*i
)。
因此,您的并行化代码可以只是
int k1, k2;
double x1,x2,y1,y2;
double a1,a2,b1,b2; //initialized to some values
double vec1[2*M*N],vec2[2*M*N]; // vec1 is read-only past this point
/*
NOTE: private(var) means that it may no longer have same value it had previously.
In your case, these were all set to 0 before. After reading chat, it seems that
they were never actually used as input anyway. As long as they are written to
before being read from (i.e. x1,y1 in func), no seg fault will occur.
*/
#pragma omp parallel for private(x2,y2,x1,y1,k1,k2)
for (int i=0; i<N; ++i) { // I defined i,j here
for (int j=0; j<M; ++j) { // If you define them outside parallel for,
// they must also be made private
// variables set below are all private, so no threads will overwrite
// work that other threads have done
x2 = j - a2;
y2 = i - b2;
func(x1,y1,x2,y2); // the function sets x1,y1 values
x2 = x1;
y2 = y1;
func2(x1,y1,x2,y2); // the function sets x1,y1 values
x1 += a1;
y1 += b1;
k1 = func3(x1,y1);
// k2 is never the same for 2 different values of i,
// so different threads will never clobber each other here:
k2 = 2*j+2*M*i;
vec2[k2] = vec1[k1];
vec2[k2+1] = vec1[k1+1];
}
}
您不必折叠两个循环。只要N > nCoresOnYourComputer
,您就可以看到如何在所有处理器之间平均分配工作。
希望一切都有道理。尝试并在我的代码中找出为什么我必须定义某些变量private
(与默认的shared
相比),以及为什么k2
必须像我一样重新定义。
练习留给读者:你如何设置k2
(也安全地避免竞争条件),但这主要保留了你最初使用的逻辑(即每次迭代k2 += 2
)?