Question

所以我有一些OpenMP代码：

for(unsigned int it = 0; it < its; ++it)
{
    #pragma omp parallel
    {
        /**
         * Run the position integrator, reset the
         * acceleration, update the acceleration, update the velocity.
         */

          #pragma omp for schedule(dynamic, blockSize)
          for(unsigned int i = 0; i < numBods; ++i)
          {
              Body* body = &bodies[i];
              body->position += (body->velocity * timestep);
              body->position += (0.5 * body->acceleration * timestep * timestep);

              /**
               * Update velocity for half-timestep, then reset the acceleration.
               */
              body->velocity += (0.5f) * body->acceleration * timestep;
              body->acceleration = Vector3();
          }

          /**
           * Calculate the acceleration.
           */
          #pragma omp for schedule(dynamic, blockSize)
          for(unsigned int i = 0; i < numBods; ++i)
          {
              for(unsigned int j = 0; j < numBods; ++j)
              {
                  if(j > i)
                  {
                      Body* body = &bodies[i];
                      Body* bodyJ = &bodies[j];

                    /**
                     * Calculating some of the subsections of the acceleration formula.
                     */
                    Vector3 rij = bodyJ->position - body->position;
                    double sqrDistWithEps = rij.SqrMagnitude() + epsilon2;
                    double oneOverDistCubed = 1.0 / sqrt(sqrDistWithEps * sqrDistWithEps * sqrDistWithEps);
                    double scalar = oneOverDistCubed * gravConst;

                    body->acceleration += bodyJ->mass * scalar * rij;
                    bodyJ->acceleration -= body->mass * scalar * rij; //Newton's Third Law.
                }
            }
        }

        /**
         * Velocity for the full timestep.
         */
        #pragma omp for schedule(dynamic, blockSize)
        for(unsigned int i = 0; i < numBods; ++i)
        {
            bodies[i].velocity += (0.5 * bodies[i].acceleration * timestep);
        }
    }

    /**
     * Don't want I/O to be parallel
     */
    for(unsigned int index = 1; index < bodies.size(); ++index)
    {
        outFile << bodies[index] << std::endl;
    }
}

这很好，但我不禁认为在每次迭代中分配一组线程是一个不好的想法。但是，迭代必须按顺序进行;所以我不能让迭代本身并行。

我只是想知道是否有办法将此设置为在每次迭代时重用相同的线程团队？另外，请随意在我的代码中解决任何其他问题。

感谢。

Answer 1

据我所知，这是最合乎逻辑的方法，线程池已经创建，每次线程到达并行构造函数时，它都会从池中请求一组线程。因此，每次到达并行区域构造函数时都不会创建线程池，但是如果要重用相同的线程，为什么不将并行构造函数推出循环并使用{{1来处理顺序代码这样的事情：

single pragma

我进行了快速搜索，这个答案的第一段可能取决于您正在使用的OpenMP实现，我强烈建议您阅读您正在使用的手册。

表格例如，来自source：

OpenMP *严格来说是一个fork / join线程模型。在一些OpenMP中实现，线程在并行区域的开头创建并在平行区域的末端被摧毁。 OpenMP应用程序通常有几个具有插入序列的并行区域区域。 为每个并行区域创建和销毁线程都可以导致显着的系统开销，特别是如果是并行区域在一个循环中; 因此，英特尔OpenMP实现使用线程池。首先创建一个工作线程池平行区域。这些线程在程序期间存在执行。如果请求，可以自动添加更多线程程序。直到最后一个并行区域，线程才会被销毁被执行。

尽管如此，如果你把并行区域放在循环之外，你不必担心上面段落中引用的潜在开销。

Answer 2

OpenMP模型通常显示为fork-join范例。但出于性能原因，在加入结束时线程不会被杀死。在某些实现中，例如Intel OpenMP，线程在挂起之前在连接结束时等待一个特定时间段的自旋锁（请参阅https://software.intel.com/en-us/node/522775上的KMP_BLOCKTIME）。

是否可以创建一个线程团队，然后仅在以后“使用”线程？

2 个答案: