MPI与顺序代码 - 免费数组问题

时间:2017-08-30 03:27:56

标签: arrays mpi free

我在计算网格上的值的小代码的顺序和MPI版本之间有一个奇怪的结果。

顺序版本如下:

int main() {

   /* Array */
   double **x;

   /* Allocation of 2D arrays */
   x = malloc(size_tot_y*sizeof(*x));

   for (i=0;i<=size_tot_y-1;i++) {
      x[i] = malloc(size_tot_x*sizeof(**x));
   }

   /* Do various computations */

   /* End of code */

   /* Free all arrays */
   for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);
   }
   free(x);

   return 0;

}

这个顺序版本工作正常,所有数组(xx0)似乎都是正确的。

现在,如果我采用MPI版本,看起来像:

 int main() {

   /* Array */
   double **x;
   double *xfinal;

   /* Allocate size_tot_y rows */
   x = malloc(size_tot_y*sizeof(*x));

   /* Allocate 2D Contiguous arrays for x */
   x[0] = malloc(size_tot_x*size_tot_y*sizeof(**x));

   /* Loop on rows */
   for (j=1;j<size_tot_y;j++) {
    /* Increment size_tot_y block on x[i] and x0[i] address */
    x[j] = x[0] + j*size_tot_x;
   }

       /* Do various computations */

       /* End of MPI code */

   /* Free all arrays */
   for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);
   }
   free(x);

   return 0;

   }

我在执行时遇到以下错误:

[machine1:04130] *** Process received signal ***
[machine1:04130] Signal: Segmentation fault (11)
[machine1:04130] Signal code: Address not mapped (1)
[machine1:04130] Failing at address: 0x7f179c020838
[machine1:04131] *** Process received signal ***
[machine1:04131] Signal: Segmentation fault (11)
[machine1:04131] Signal code: Address not mapped (1)
[machine1:04131] Failing at address: 0x7ff0b417c838
[machine1:04132] *** Process received signal ***
[machine1:04132] Signal: Segmentation fault (11)
[machine1:04132] Signal code: Address not mapped (1)
[machine1:04132] Failing at address: 0x7f8560001838
[machine1:04133] *** Process received signal ***
[machine1:04133] Signal: Segmentation fault (11)
[machine1:04133] Signal code: Address not mapped (1)
[machine1:04133] Failing at address: 0x7f22f415f838
[machine1:04134] *** Process received signal ***
[machine1:04140] *** Process received signal ***

          [machine1:04134] Signal: Segmentation fault (11)
          [machine1:04134] Signal code: Address not mapped (1)
          [machine1:04134] Failing at address: 0x7f4e3c0d3838
          [machine1:04142] *** Process received signal ***
          [machine1:04142] Signal: Segmentation fault (11)
          [machine1:04142] Signal code: Address not mapped (1)
          [machine1:04142] Failing at address: 0x7ff0d4064838
          [machine1:04140] Signal: Segmentation fault (11)
          [machine1:04140] Signal code: Address not mapped (1)
          [machine1:04140] Failing at address: 0x7fb2941c3838
          [machine1:04129] *** Process received signal ***
          [machine1:04129] Signal: Segmentation fault (11)
          [machine1:04129] Signal code: Address not mapped (1)
          [machine1:04129] Failing at address: 0x7f9150049838
          [machine1:04142] [machine1:04134] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f4e48e55890]
          [machine1:04134] [machine1:04129] [ 0] [machine1:04130] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x[machine1:04131] [ 0] [machine1:04132] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0([machine1:04140] [ 1] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f91550a8890]
          [machine1:04129] [ 1] f890)[0x7f179f424890]
          [machine1:04130] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0b777e890]
          [machine1:04131] [ 1] [machine1:04133] [ 0] +0xf890)[0x7f8564847890]
          [machine1:04132] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f4e48b17614]
          [machine1:04134] (+0xf890)[0x7fb2979c7890]
          /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f179f0e6614]
          [machine1:04130] [ 2] ./explicitPar[0x401c48]
          /lib/x86_64-linux-gnu/libpthread.so.0[ 2] ./explicitPar[0x401c48]
          [machine1:04134] [ 3] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f8564509614]
          [machine1:04132] (+0xf890/lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f9154d6a614]
          [machine1:04129] [machine1:04140] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7ff0b7440614]
          [machine1:04131] [machine1:04130] [ 3] /lib/x86_64-linux-gnu/libc.so.6(/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x[ 2] ./explicitPar[0x401c48]
          [machine1:04132] [ 3] [ 2] ./explicitPar[0x401c48]
          [machine1:04129] [ 3] [ 2] ./explicitPar[0x401c48]
          [machine1:04131] [ 3] __libc_start_main+0xf5)[0x7f179f08bb45]
          [machine1:04130] [ 4] ./explicitPar[0x400e49]
          [machine1:04130] *** End of error message ***
          f5)[0x7f4e48abcb45]
          [machine1:04134] )[0x7f22f8bb2890]
          [machine1:04133] /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ff0b73e5b45[ 4] ./explicitPar[0x400e49]
          /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f9154d0fb45]
          [machine1:04129] ]
          [machine1:04131] [ 4] ./explicitPar[0x7f85644aeb45]
          [machine1:04132] /lib/x86_64-linux-gnu/libc.so.6(cfree[ 0] [ 4] ./explicitPar[0x400e49]
          [machine1:04129] *** End of error message ***
          (cfree+0x14)[0x7fb297689614]
          [machine1:04140] [ 2] ./explicitPar[0x401c48[machine1:04134] *** End of error message ***
          [0x400e49]
          [machine1:04131] *** End of error message ***
          [ 4] ./explicitPar[0x400e49]
          [machine1:04132] *** End of error message ***
          +0x14)[0x7f22f8874614]
          [machine1:04133] ]
          [machine1:04140] [ 3] [ 2] ./explicitPar/lib/x86_64-linux-gnu/libc.so.6[0x401c48]
          [machine1:04133] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb29762eb45]
          [machine1:04140] [ 4] (__libc_start_main+0xf5)[0x./explicitPar[0x7f22f8819b45]
          [machine1:04133] 400e49]
          [machine1:04140] *** End of error message ***
          [ 4] ./explicitPar[0x400e49]
          [machine1:04133] *** End of error message ***
          /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0d9907890]
          [machine1:04142] [ 1] --------------------------------------------------------------------------
          mpirun noticed that process rank 1 with PID 0 on node machine1 exited on signal 11 (Segmentation fault).

如果我只是为了释放数组:

   free(x);

,我在这里评论了部分:

/*for (i=0;i<=size_tot_y-1;i++) {
      free(x[i]);      
   }
 */

然后,我没有像上面那样得到错误:所以问题来自于在MPI代码版本中释放数组的方式。

为什么第二个表达式释放数组不好?我原以为在两种情况下释放它们的方法都是一样的,但似乎没有。

欢迎任何帮助或评论,问候。

1 个答案:

答案 0 :(得分:0)

数组分配和取消分配必须是对称的。

您确实将2D数组声明为double **,因此它们实际上是指向double数组的指针数组。 在顺序版本中,您为列发布了一个malloc(),然后为每个行发出一个malloc() 。你的行不会在连续的内存中,但这很好。

这种方法通常对MPI无效,因为您可能会将2D数组传递给某些期望连续数据布局的MPI函数。 因此,您为列发布了一个malloc()(到目前为止没有更改),然后为所有行发布了一个单个 malloc()。然后你构造了第一个分配的数组,指向第二个数组。 因此,在解除分配2D数组时,您只需要发出两个free()

所以解除分配x数组的正确方法是

free(x[0]);
free(x);