我在计算网格上的值的小代码的顺序和MPI版本之间有一个奇怪的结果。
顺序版本如下:
int main() {
/* Array */
double **x;
/* Allocation of 2D arrays */
x = malloc(size_tot_y*sizeof(*x));
for (i=0;i<=size_tot_y-1;i++) {
x[i] = malloc(size_tot_x*sizeof(**x));
}
/* Do various computations */
/* End of code */
/* Free all arrays */
for (i=0;i<=size_tot_y-1;i++) {
free(x[i]);
}
free(x);
return 0;
}
这个顺序版本工作正常,所有数组(x
,x0
)似乎都是正确的。
现在,如果我采用MPI版本,看起来像:
int main() {
/* Array */
double **x;
double *xfinal;
/* Allocate size_tot_y rows */
x = malloc(size_tot_y*sizeof(*x));
/* Allocate 2D Contiguous arrays for x */
x[0] = malloc(size_tot_x*size_tot_y*sizeof(**x));
/* Loop on rows */
for (j=1;j<size_tot_y;j++) {
/* Increment size_tot_y block on x[i] and x0[i] address */
x[j] = x[0] + j*size_tot_x;
}
/* Do various computations */
/* End of MPI code */
/* Free all arrays */
for (i=0;i<=size_tot_y-1;i++) {
free(x[i]);
}
free(x);
return 0;
}
我在执行时遇到以下错误:
[machine1:04130] *** Process received signal ***
[machine1:04130] Signal: Segmentation fault (11)
[machine1:04130] Signal code: Address not mapped (1)
[machine1:04130] Failing at address: 0x7f179c020838
[machine1:04131] *** Process received signal ***
[machine1:04131] Signal: Segmentation fault (11)
[machine1:04131] Signal code: Address not mapped (1)
[machine1:04131] Failing at address: 0x7ff0b417c838
[machine1:04132] *** Process received signal ***
[machine1:04132] Signal: Segmentation fault (11)
[machine1:04132] Signal code: Address not mapped (1)
[machine1:04132] Failing at address: 0x7f8560001838
[machine1:04133] *** Process received signal ***
[machine1:04133] Signal: Segmentation fault (11)
[machine1:04133] Signal code: Address not mapped (1)
[machine1:04133] Failing at address: 0x7f22f415f838
[machine1:04134] *** Process received signal ***
[machine1:04140] *** Process received signal ***
[machine1:04134] Signal: Segmentation fault (11)
[machine1:04134] Signal code: Address not mapped (1)
[machine1:04134] Failing at address: 0x7f4e3c0d3838
[machine1:04142] *** Process received signal ***
[machine1:04142] Signal: Segmentation fault (11)
[machine1:04142] Signal code: Address not mapped (1)
[machine1:04142] Failing at address: 0x7ff0d4064838
[machine1:04140] Signal: Segmentation fault (11)
[machine1:04140] Signal code: Address not mapped (1)
[machine1:04140] Failing at address: 0x7fb2941c3838
[machine1:04129] *** Process received signal ***
[machine1:04129] Signal: Segmentation fault (11)
[machine1:04129] Signal code: Address not mapped (1)
[machine1:04129] Failing at address: 0x7f9150049838
[machine1:04142] [machine1:04134] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f4e48e55890]
[machine1:04134] [machine1:04129] [ 0] [machine1:04130] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x[machine1:04131] [ 0] [machine1:04132] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0([machine1:04140] [ 1] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7f91550a8890]
[machine1:04129] [ 1] f890)[0x7f179f424890]
[machine1:04130] [ 1] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0b777e890]
[machine1:04131] [ 1] [machine1:04133] [ 0] +0xf890)[0x7f8564847890]
[machine1:04132] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f4e48b17614]
[machine1:04134] (+0xf890)[0x7fb2979c7890]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f179f0e6614]
[machine1:04130] [ 2] ./explicitPar[0x401c48]
/lib/x86_64-linux-gnu/libpthread.so.0[ 2] ./explicitPar[0x401c48]
[machine1:04134] [ 3] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f8564509614]
[machine1:04132] (+0xf890/lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7f9154d6a614]
[machine1:04129] [machine1:04140] [ 1] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x14)[0x7ff0b7440614]
[machine1:04131] [machine1:04130] [ 3] /lib/x86_64-linux-gnu/libc.so.6(/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x[ 2] ./explicitPar[0x401c48]
[machine1:04132] [ 3] [ 2] ./explicitPar[0x401c48]
[machine1:04129] [ 3] [ 2] ./explicitPar[0x401c48]
[machine1:04131] [ 3] __libc_start_main+0xf5)[0x7f179f08bb45]
[machine1:04130] [ 4] ./explicitPar[0x400e49]
[machine1:04130] *** End of error message ***
f5)[0x7f4e48abcb45]
[machine1:04134] )[0x7f22f8bb2890]
[machine1:04133] /lib/x86_64-linux-gnu/libc.so.6/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ff0b73e5b45[ 4] ./explicitPar[0x400e49]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f9154d0fb45]
[machine1:04129] ]
[machine1:04131] [ 4] ./explicitPar[0x7f85644aeb45]
[machine1:04132] /lib/x86_64-linux-gnu/libc.so.6(cfree[ 0] [ 4] ./explicitPar[0x400e49]
[machine1:04129] *** End of error message ***
(cfree+0x14)[0x7fb297689614]
[machine1:04140] [ 2] ./explicitPar[0x401c48[machine1:04134] *** End of error message ***
[0x400e49]
[machine1:04131] *** End of error message ***
[ 4] ./explicitPar[0x400e49]
[machine1:04132] *** End of error message ***
+0x14)[0x7f22f8874614]
[machine1:04133] ]
[machine1:04140] [ 3] [ 2] ./explicitPar/lib/x86_64-linux-gnu/libc.so.6[0x401c48]
[machine1:04133] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fb29762eb45]
[machine1:04140] [ 4] (__libc_start_main+0xf5)[0x./explicitPar[0x7f22f8819b45]
[machine1:04133] 400e49]
[machine1:04140] *** End of error message ***
[ 4] ./explicitPar[0x400e49]
[machine1:04133] *** End of error message ***
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf890)[0x7ff0d9907890]
[machine1:04142] [ 1] --------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node machine1 exited on signal 11 (Segmentation fault).
如果我只是为了释放数组:
free(x);
,我在这里评论了部分:
/*for (i=0;i<=size_tot_y-1;i++) {
free(x[i]);
}
*/
然后,我没有像上面那样得到错误:所以问题来自于在MPI代码版本中释放数组的方式。
为什么第二个表达式释放数组不好?我原以为在两种情况下释放它们的方法都是一样的,但似乎没有。
欢迎任何帮助或评论,问候。
答案 0 :(得分:0)
数组分配和取消分配必须是对称的。
您确实将2D数组声明为double **
,因此它们实际上是指向double
数组的指针数组。
在顺序版本中,您为列发布了一个malloc()
,然后为每个行发出一个malloc()
。你的行不会在连续的内存中,但这很好。
这种方法通常对MPI无效,因为您可能会将2D数组传递给某些期望连续数据布局的MPI函数。
因此,您为列发布了一个malloc()
(到目前为止没有更改),然后为所有行发布了一个单个 malloc()
。然后你构造了第一个分配的数组,指向第二个数组。
因此,在解除分配2D数组时,您只需要发出两个free()
。
所以解除分配x
数组的正确方法是
free(x[0]);
free(x);