Question

这是我在这里的第一篇文章，所以对我来说很容易！我有一个非常奇怪的问题。我编写了一个将粒子数据转换为网格数据的c代码（数据来自宇宙学模拟）。为了进行这种转换，我使用的是gsl Monte Carlo vegas积分器。当我连续运行它时，它运行得很好并给我正确的答案（尽管很慢）。为了加快速度，我尝试了openmp。问题是，当我并行运行时，积分超时（我在积分循环中设置了一个MAX_ITER变量，以避免由于缺乏收敛而产生无限循环）。在并行代码块之前设置并初始化随机数生成器。我检查并仔细检查，并且所有关于它并行失败的粒子的数据（x，y和z位置以及传递给积分器的积分边界）在串行和并行中是相同的。我也尝试将MAX_ITER变量从100增加到1000，但这没有做任何事情;它只需要更长的时间才能失败。

我的问题是，如果有人知道为什么代码会以串行方式运行，但在使用完全相同的粒子时会并行超时？

另外，如果你需要它，有问题的粒子的数字是：x = 0.630278，y = 24.952896，z = 3.256376，h = 3（这是粒子的平滑长度，用于“涂抹”超出粒子的质量，因为模拟的目标是使用粒子对流体进行采样。这是sph方法），x积分边界（下，上）= {0,630278}，y bounds = {21.952896,27.952896}和z bounds = {0.256376,6.256375}

转换背后的想法是粒子的质量包含在半径为h的“平滑球体”内，并以粒子本身为中心。这个质量不是均匀分布的，而是根据sph内核完成的（这是我正在集成的函数）。因此，取决于球体如何放置在其“家庭细胞”内，只有球体的一部分实际上可能在该细胞内。然后，目标是获得适当的集成界限并将它们传递给集成商。积分器（下面的代码）有一个检查，如果蒙特卡洛积分器给它的点位于球体之外，它返回0（这是因为获得每个可能情况的精确集成限制是一个巨大的痛苦）。

我的循环代码在这里：

// Loop over every particle
#pragma omp parallel shared(M_P, m_header, NumPartTot, num_grid_elements, cxbounds,
cybounds, czbounds, master_cell) private(index, x, y, z, i, j, k, h, tid, cell,
corners) 
{
tid = omp_get_thread_num();

  // Set up cell struct. Allocate memory!
  cell = set_up_cell();

  #pragma omp for
  for(index = 1; index <= NumPartTot; index++)
  {    
     printf("\n\n\n************************************\n");
     printf("Running particle: %d on thread: %d\n", index, tid);
     printf("x = %f  y = %f   z = %f\n", M_P[index].Pos[0], M_P[index].Pos[1], M_P[index].Pos[2]);
     printf("**************************************\n\n\n");
     fflush(stdout);
     // Set up convenience variables
     x = M_P[index].Pos[0];
     y = M_P[index].Pos[1];
     z = M_P[index].Pos[2];

     // Figure out which cell the particle is in
     i = (int)((x / m_header.BoxSize) * num_grid_elements);
     j = (int)((y / m_header.BoxSize) * num_grid_elements);
     k = (int)((z / m_header.BoxSize) * num_grid_elements);

     corners = get_corners(i, j, k);

     // Check to see what type of particle we're dealing with
     if(M_P[index].Type == 0)
     {    
        h = M_P[index].hsml;
        convert_gas(i, j, k, x, y, z, h, index, cell, corners);
     }    

     else 
     {    
        update_cell_non_gas_properties(index, i, j, k, cell);
     }    
  }    

  // Copy each thread's version of cell to cell_master
  #ifdef _OPENMP
     copy_to_master_cell(cell);
     free_cell(cell);
  #endif
} /*-- End of parallel region --*/

问题出现在convert_gas函数中。有问题的部分在这里（在家庭单元块中）：

 // Case 6: Left face
if(((x + h) < cxbounds[i][j][k].hi) && ((x - h) < cxbounds[i][j][k].lo) &&
((y + h) < cybounds[i][j][k].hi) && ((y - h) >= cybounds[i][j][k].lo) &&
((z + h) < czbounds[i][j][k].hi) && ((z - h) >= czbounds[i][j][k].lo))
{
  printf("Using case 6\n");
  fflush(stdout);

  // Home cell
  ixbounds.lo = cxbounds[i][j][k].lo;
  ixbounds.hi = x + h;
  iybounds.lo = y - h;
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i, j, k);

  update_cell_gas_properties(kernel, i, j, k, index, cell);

  // Left cell
  ixbounds.lo = x - h;
  ixbounds.hi = cxbounds[i][j][k].lo;
  iybounds.lo = y - h; // Not actual bounds. See note above.
  iybounds.hi = y + h;
  izbounds.lo = z - h;
  izbounds.hi = z + h;

  kernel = integrate(ixbounds, iybounds, izbounds, h, index, i - 1, j, k);

  update_cell_gas_properties(kernel, i - 1, j, k, index, cell);

  return;

}

我目前使用的数据是测试数据，因此我确切知道粒子应该在哪里以及它们应该具有哪些集成边界。使用gdb时，我发现所有这些数字都是正确的。函数集成中的集成循环在这里（TOLERANCE为0.2，WARM_CALLS为10000，N_CALLS为100000）：

 gsl_monte_vegas_init(monte_state);
  // Warm up
  gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                            WARM_CALLS, random_generator, monte_state, &result, &error);

  // Actual integration
  do
  {
     gsl_monte_vegas_integrate(&monte_function, lower_bounds, upper_bounds, 3,
                               N_CALLS, random_generator, monte_state, &result, &error);
     iter++;

  } while(fabs(gsl_monte_vegas_chisq(monte_state) - 1.0) > TOLERANCE && iter < MAX_ITER);



  if(iter >= MAX_ITER)
  {
     fprintf(stdout, "ERROR!!! Max iterations %d exceeded!!!\n"
             "See M_P[%d].id : %d   (%f %f %f)\n"
             "lower bnds : (%f %f %f)   upper bnds : (%f %f %f)\n"
              "trying to integrate in cell %d %d %d\n\n", MAX_ITER, pind, M_P[pind].id,
              M_P[pind].Pos[0], M_P[pind].Pos[1], M_P[pind].Pos[2],
              ixbounds.lo, iybounds.lo, izbounds.lo, ixbounds.hi, iybounds.hi, izbounds.hi, i, j, k);
     fflush(stdout);
     exit(1);
  }

同样，这个确切的代码（但没有openmp，我将该编译时选项作为makefile中的一个选项传递）与完全相同的数字串行运行，但不是并行运行。我确定这是一件很蠢的事情，我已经完成了，目前根本看不到（至少，我希望！）无论如何，感谢您的帮助！

gsl openmp集成失败

0 个答案: