Question

我刚开始学习openmp编程，但是陷入了一段代码，该代码试图并行化用于计算pi的程序。我无法理解该行在程序中的作用以及后面的注释的含义。

  if (id == 0) nthreads = nthrds; //Only one thread should copy the number of threads to the global value to make sure multiple threads writing to the same address don’t conflict.

整个代码是：

#include<omp.h>
#include<stdio.h>
#define NUM_THREADS 2

static long num_steps = 100000;
double step;

int main ()
{
   int i, nthreads;
   double pi, sum[NUM_THREADS];
   step = 1.0/(double) num_steps;

   omp_set_num_threads(NUM_THREADS);
   double time1 = omp_get_wtime();
   #pragma omp parallel
   {
       int i, id,nthrds;
       double x;
       id = omp_get_thread_num();
       nthrds = omp_get_num_threads();
       if (id == 0) nthreads = nthrds; //Only one thread should copy 
       the number of threads to the global value to make sure multiple 
       threads writing to the same address don’t conflict.

       for (i=id, sum[id]=0.0;i< num_steps; i=i+nthrds){
             x = (i+0.5)*step;
             sum[id] += 4.0/(1.0+x*x);
       }
   }
   double time2 = omp_get_wtime();

   for(i=0, pi=0.0;i<nthreads;i++)pi += sum[i] * step;
   printf("%lf\n",pi);
   printf("%lf\n",(time2-time1));

}

我尝试在没有if语句的情况下运行，但是它给出了pi 0的值，但否则运行正确（给出了3.141593）。当我尝试全局分配等于外部线程总数（即2）的nthreads时，它仍然给出pi的正确值。有人可以解释一下输出的差异吗？

谢谢！

Answer 1

在最终循环的求和步骤中需要设置变量nthreads

for(i=0, pi=0.0;i<nthreads;i++)pi += sum[i] * step;

删除分配将中断此循环。让我尝试重新整理评论，为什么您不能简单地这样做

nthreads = nthrds;

如果在没有任何保护的情况下从多个线程写入共享内存位置，则该值可能是错误的。但是，通常通常使用atomic作为保护。在这种情况下，#pragma omp single nowait更合适。我猜测动态地写这个变量而不是仅仅使用NUM_THREADS的想法是，您可能并不总是保证它。

无论如何。本教程存在很大问题。它尝试使用原始图元而不是使用适当的惯用高级工具来教授OpenMP。这导致lots中的confusion。我认为这对教授OpenMP来说是一种不好的方法，特别是如果您不完全按照本教程进行操作的话。

实际的正确方法将在本教程的稍后部分给出（我进行了一些现代化的改进）：

double sum = 0.0;
int step = 1.0/(double) num_steps;
omp_set_num_threads(NUM_THREADS);
#pragma omp parallel for reduction(+:sum)
for (int i=0; i < num_steps; i++) {
    double x = (i+0.5)*step;
    sum = sum + 4.0/(1.0+x*x);
}
double pi = step * sum;

Answer 2

当您尝试全局分配等于外部线程总数（即2个）的nthreads时，它仍然给出了正确的pi值，因为您从计算机请求的线程数已分配给您（即情况是2），但是如果您要求100万个线程，计算机可能不会给您这么多的线程。因此，要知道为您分配了多少线程，您需要编写这段代码。

    if (id == 0) nthreads = nthrds;

避免openmp中线程之间的冲突

2 个答案: