Question

我有一个程序可以模拟2D mash上实体的移动。它在几次迭代中运行一系列操作。所有操作都使用网格，其中一些操作需要某种线程同步。我有pthreads的经验，我知道发生了什么，但这次我必须使用OpenMP。

如果我可以在函数中使用OpenMP指令，我的问题是我不知道（我甚至无法在互联网上找到任何提及）。 OpenMP的所有示例只使用一个函数，所有对其他函数的调用都是独立于所有线程的。我的目的是创建所有线程一次，然后多次并行运行所有线程。我知道每个迭代中的每个函数都会被称为线程数时间;这是所希望的;应该在这些函数内部处理同步和工作分裂。

程序经常因Segmentation Fault而崩溃;通常在finishStep函数或printPopulation中的某个地方。用gdb调试是不可能的。崩溃的一个例子（属性xStart，xEnd仅在程序启动时设置）：

0x0000000000401d7d in printPopulations (world=0x607030) at apocalypse.c:210
210             Tile *currentCell = GET_TILE(world, x, y);
(gdb) bt
#0  0x0000000000401d7d in printPopulations (world=0x607030) at apocalypse.c:210
#1  0x0000000000401ef0 in printStatistics (world=0x607030) at apocalypse.c:253
#2  0x0000000000402126 in main._omp_fn () at apocalypse.c:316
#3  0x00007ffff78cef3e in ?? () from /usr/lib/libgomp.so.1
#4  0x00007ffff7475124 in start_thread () from /usr/lib/libpthread.so.0
#5  0x00007ffff71a94bd in clone () from /usr/lib/libc.so.6
(gdb) info locals
currentCell = 0x7ffff71360f9 <__GI__IO_do_write+25>
y = 414
humans = 0
x = -1511828487 // this is weird
y = 2
x = 2
humans = 800

这是我的代码的简化版本：

World * w = ...;
#pragma omp parallel num_threads(numThreads) default(shared)
{
  for (int i = 0; i < iters; i++) {
    simulateStep(w);
    finishStep(w);
    printPopulations(w);
    #pragma omp single // this should synchronize everything
    printf("time: %d\n", w->clock);
  }
  cleanup(); // each thread cleans its own threadprivate global variable.
}

模拟步骤包含两个嵌套的for循环。只有一个线程应该增加时钟计数器。线程将在增量结束时同步。之后，当访问world-＆gt;时钟时，它们都将获得相同的值。网格被分成相等的部分;他们每个人都是全高的。

void simulateStep(World * world) {
  #pragma omp single
  world->clock++;

  #pragma omp for schedule(static)
  for (int x = world->xStart; x <= world->xEnd; x++) {
    lockColumn(world, x);
    for (int y = world->yStart; y <= world->yEnd; y++) {
      // independent calculation for each tile [x, y]
    }
  }
}

因为网格上的实体移动并且它们可以移动到范围之外，我们需要将它们从边界移回。动机：每一方最多使用一个线程。如果有更多线程，它们将在section区域的末尾等待。如果少了，那么有些会运行几个部分。

void finishStep(World * world) {
  #pragma omp sections
  {
    #pragma omp section
    {
      MOVE_BACK(...) // moves the entity on the left border back
    }
    #pragma omp section
    {
      MOVE_BACK(...) // right
    }
    #pragma omp section
    {
      MOVE_BACK(...) // top
    }
    #pragma omp section
    {
      MOVE_BACK(...) // bottom
    }
  }
  // this function basically traverses the world and modifies all tiles.
  // it contains two nested fors which are parallelized like in printPopulation.
  resetWorld(world);
}

打印人口是。动机：我们需要一个函数区域共享变量，因此人类是静态的。只有一个线程应重置该值。每个线程将在网格的一部分中计算人类，最后，他们的计数将减少为共享变量人类。最后，其中一个线程应该打印人口数。

void printPopulation(World * world) {
  static int humans;
  #pragma omp single
  humans = 0;

  #pragma omp for collapse(2) schedule(guided, 10) reduction(+: humans)
  for (int x = world->xStart; x <= world->xEnd; x++) {
    for (int y = world->yStart; y <= world->yEnd; y++) {
      Tile *currentCell = GET_TILE(world, x, y);
      if (currentCell->entity != NULL)
        humans++;
    }
  }

  #pragma omp single
  printf("Time: %d \tHumans: %4d\n", humans);
}

我可以在并行调用的函数内使用for，single和sections区域的这种方法吗？或者我可以仅在定义并行区域的函数内使用它们吗？

请尝试解释原因，我试图在OpenMP规范中找到一些但失败了。我知道我可以在函数中使用关键区域。我应该用关键区域替换#pragma omp single吗？这样：

static int clock; // time of processing
#pragma omp critical(nameOfRegion)
{
  if (clock < world->clock) {
    clock = world->clock;
    // this part will be run only once per each timestamp
  }
}

同样我可以替换部分。有一个循环，其中一个阵列是关键的。每个线程都会找到一个未处理的元素，而不是使用switch跳转到该部分。

但是对于for循环，我必须手动划分范围，或者基本上使用锁或关键区域编写我自己的调度程序。

或（最糟糕的情况）：我必须将并行区域从main函数移动到我想要并行化循环的所有函数。

函数内部的OpenMP指令

0 个答案: