Question

我正在对双维数组进行热力学模拟。该数组为1024x1024。 while循环遍历指定的次数或直到goodTempChange为false。根据块的温度变化大于定义的EPSILON值，将goodTempChange设置为true或false。如果阵列中的每个块都低于该值，则该板处于停滞状态。该程序有效，我对代码没有任何问题，我的问题是串口代码绝对是将openmp代码从水中吹出来的。我不知道为什么。我已经尝试删除除了平均计算之外的所有内容，这只是在你想要的方块周围的上，下，左，右四个块的平均值，但它仍然被串行代码破坏了。我之前从未做过openmp，我在网上查了一些东西来做我所拥有的。我能够以最有效的方式在关键区域内获得变量，我没有竞争条件。我真的没有看到什么是错的。任何帮助将不胜感激。感谢。

while(iterationCounter < DESIRED_ITERATIONS && goodTempChange) {
  goodTempChange = false;
  if((iterationCounter % 1000 == 0) && (iterationCounter != 0)) {
    cout << "Iteration Count      Highest Change    Center Plate Temperature" << endl;
    cout << "-----------------------------------------------------------" << endl;
    cout << iterationCounter << "               "
         << highestChange << "            " << newTemperature[MID][MID] << endl;
    cout << endl;
  }

  highestChange = 0;

  if(iterationCounter != 0)
    memcpy(oldTemperature, newTemperature, sizeof(oldTemperature));

  for(int i = 1; i < MAX-1; i++) {  
  #pragma omp parallel for schedule(static) 
    for(int j = 1; j < MAX-1; j++) {
      bool tempGoodChange = false;
      double tempHighestChange = 0;
      newTemperature[i][j] = (oldTemperature[i-1][j] + oldTemperature[i+1][j] +
                              oldTemperature[i][j-1] + oldTemperature[i][j+1]) / 4;

      if((iterationCounter + 1) % 1000 == 0) {
        if(abs(oldTemperature[i][j] - newTemperature[i][j]) > highestChange)
          tempHighestChange = abs(oldTemperature[i][j] - newTemperature[i][j]);
        if(tempHighestChange > highestChange) {
          #pragma omp critical
          {
            if(tempHighestChange > highestChange)
              highestChange = tempHighestChange;
          }
        }
      }
      if(abs(oldTemperature[i][j] - newTemperature[i][j]) > EPSILON
         && !tempGoodChange)
        tempGoodChange = true;

      if(tempGoodChange && !goodTempChange) {
        #pragma omp critical
        {
          if(tempGoodChange && !goodTempChane)
            goodTempChange = true;
        }
      }
    }
  }
  iterationCounter++;
}

Answer 1

试图摆脱那些关键部分可能有所帮助。例如：

#pragma omp critical
{
  if(tempHighestChange > highestChange)
  {
    highestChange = tempHighestChange;
  }
}

在这里，您可以将每个线程计算的highestChange存储在局部变量中，并且当并行部分完成时，获得您拥有的highestChange的最大值。

Answer 2

这是我的尝试（未经测试）。

double**newTemperature;
double**oldTemperature;

while(iterationCounter < DESIRED_ITERATIONS && goodTempChange) {
  if((iterationCounter % 1000 == 0) && (iterationCounter != 0))
    std::cout
      << "Iteration Count      Highest Change    Center Plate Temperature\n"
      << "---------------------------------------------------------------\n" 
      << iterationCounter << "               "
      << highestChange << "            "
      << newTemperature[MID][MID] << '\n' << std::endl;

  goodTempChange = false;
  highestChange  = 0;

  // swap pointers to arrays (but not the arrays themselves!)
  std::swap(newTemperature,oldTemperature);
  if(iterationCounter != 0)
    std::swap(newTemperature,oldTemperature);

  bool CheckTempChange = (iterationCounter + 1) % 1000 == 0;
#pragma omp parallel
  {
    bool localGoodChange = false;
    double localHighestChange = 0;
#pragma omp for
    for(int i = 1; i < MAX-1; i++) {
      //
      // note that putting a second
      // #pragma omp for
      // here has (usually) zero effect. this is called nested parallelism and
      // usually not implemented, thus the new nested team of threads has only
      // one thread.
      //
      for(int j = 1; j < MAX-1; j++) {
        newTemperature[i][j] = 0.25 *   // multiply is faster than divide
          (oldTemperature[i-1][j] + oldTemperature[i+1][j] +
           oldTemperature[i][j-1] + oldTemperature[i][j+1]);
        if(CheckTempChange)
          localHighestChange =
            std::max(localHighestChange,
                     std::abs(oldTemperature[i][j] - newTemperature[i][j]));
        localGoodChange = localGoodChange ||
          std::abs(oldTemperature[i][j] - newTemperature[i][j]) > EPSILON;
        // shouldn't this be < EPSILON? in the previous line?
      }
    }
    //
    // note that we have moved the critical sections out of the loops to
    // avoid any potential issues with contentions (on the mutex used to
    // implement the critical section). Also note that I named the sections,
    // allowing simultaneous update of goodTempChange and highestChange
    //
    if(!goodTempChange && localGoodChange)
#pragma omp critical(TempChangeGood)
      goodTempChange = true;
    if(CheckTempChange && localHighestChange > highestChange)
#pragma omp critical(TempChangeHighest)
      highestChange = std::max(highestChange,localHighestChange);
  }
  iterationCounter++;
}

原作有几处变化：

嵌套for循环的外部而不是内部并行执行。这应该会产生重大影响。 在编辑中添加：从评论中可以看出您不明白这一点的重要性，所以让我解释一下。在原始代码中，外部循环（超过i）仅由主线程完成。对于每个i，创建一组线程以并行执行j上的内循环。这会在每i创建一个同步开销（具有显着的不平衡）！如果一个而不是在i上并行外部循环，那么这个开销只会遇到一次，并且每个线程将在j 上运行整个内部循环，以获得 {{1}的共享}。因此，始终将最外层循环并行化是多线程编码的基本智慧。
双i循环位于并行区域内，以最小化每个for循环每个线程的关键区域调用。您也可以考虑将整个while循环放在并行区域中。
我还在两个数组之间交换（类似于其他答案中的建议）以避免while，但这不应该是性能关键。 在编辑中添加： memcpy 仅交换指针值，而不是指向的内存，当然，这就是重点。

最后，不要忘记布丁的证明就在吃饭中：试试看前面有std::swap(newTemperature,oldTemperature)的不同之处内圈或外圈。在询问SO 之前总是做这样的实验 - 否则你可能会被指责没有做足够的研究。

Answer 3

我假设您关注while循环中整个代码所花费的时间，而不仅仅是循环开始for(int i = 1; i < MAX-1; i++)所花费的时间。

此操作

if(iterationCounter != 0)
{
    memcpy(oldTemperature, newTemperature, sizeof(oldTemperature));
}

是不必要的，对于大型数组，可能足以杀死性能。不是维护2个数组old和new，而是维护一个包含两个平面的3D数组。创建两个整数变量，让我们称之为old和new，并最初将它们设置为0和1。取代

newTemperature[i][j] = ((oldTemperature[i-1][j] +  oldTemperature[i+1][j] + oldTemperature[i][j-1] + oldTemperature[i][j+1]) / 4);

通过

temperature[new][i][j] = 
  (temperature[old][i-1][j] +
   temperature[old][i+1][j] +
   temperature[old][i][j-1] +
   temperature[old][i][j+1])/4;

并且，在更新结束时交换old和new的值，以便更新反过来。我将留给您确定old/new是否应该是您的数组或最后一个索引的第一个索引。这种方法消除了在内存中移动（大量）数据的需要。

this SO question and answer涵盖了严重减速或未加速的另一个可能原因。每当我看到大小为2^n的数组时，我怀疑是缓存问题。

C ++ openmp比串行实现慢得多

3 个答案: