我需要在threadprivate数组上执行几个操作,然后将这些内容汇总到一个全局共享数组中。我曾尝试过两种方式,第一种方式是atomic
指令:
#define N 100
int i;
#pragma omp threadprivate(i)
double local_x[N],local_y[N],local_z[N],x[N],y[N],z[N];
#pragma omp threadprivate(local_x,local_y,local_z)
int main(){
for(i=0;i<N;i++) x[i]=y[i]=z[i]=0.;
eval_local_xyz(); // the content of local_x,local_y,local_z is now changed
// now we want to collect the local arrays into the global ones
#pragma omp parallel
{
for(i=0;i<N;i++){
#pragma omp atomic
x[i]+=local_x[i];
#pragma omp atomic
y[i]+=local_y[i];
#pragma omp atomic
z[i]+=local_z[i];
}
}
}
和另一个critical
:
#define N 100
int i;
#pragma omp threadprivate(i)
double local_x[N],local_y[N],local_z[N],x[N],y[N],z[N];
#pragma omp threadprivate(local_x,local_y,local_z)
int main(){
for(i=0;i<N;i++) x[i]=y[i]=z[i]=0.;
eval_local_xyz(); // the content of local_x,local_y,local_z is now changed
// now we want to collect the local arrays into the global ones
#pragma omp parallel
{
#pragma omp critical (sumx)
for(i=0;i<N;i++) x[i]+=local_x[i];
#pragma omp critical (sumy)
for(i=0;i<N;i++) y[i]+=local_y[i];
#pragma omp critical (sumz)
for(i=0;i<N;i++) z[i]+=local_z[i];
}
}
对于大N,第二种方法看起来比第一种方法快。但是我从两种方法得到的结果略有不同。问题是:这两种方法是否应该产生相同的结果?