使用MPI_Send和MPI_Recv传递数组时出错

时间:2018-11-18 17:54:28

标签: c mpi

我正在尝试通过并收到MPI_SendMPI_Recv的双精度数组,但是它不起作用

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/time.h>

#define N 5
#define ITERS 10
#define ARRAY_SIZE (N+2) * (N+2)
// N and ITERS might be input arguments

double **A;

void initialize (double **A)
{
  int i,j;

   for(i =0; i < N+2 ; i++){
     for(j =0; j < N+2 ; j++){
      if(i== 0 || j == 0 || i == (N+1) || j == (N +1) )
        A[i][j] = 0.0;
      else
        A[i][j] = rand() % 10 + 1;
     }
   }
}
void showArray(double **A){
  int i,j;
  printf("\n");
  for(i =0 ; i < N+2 ; i++){
    for(j =0; j < N+2 ; j++){
      printf("%f, ",A[i][j]);
    }
    printf("\n");
  }
}

void stencil(double **A){
  int i,j;
  printf("\n");
  for(i =1 ; i <= N ; i++){
    for(j =1; j <=N ; j++){
      A[i][j] = 0.3 *( A[i][j] + A[i-1][j] + A[i+1][j] + A[i][j-1] + A[i][j+1]);
    }
  }
}


int main(int argc, char * argv[]){

  int MyProc, size,tag=1;
  char msg='A', msg_recpt;
  MPI_Status status;
  double **received_array;

  //showArray(A);
  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &MyProc);
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  printf("Process # %d started \n", MyProc);
  MPI_Barrier(MPI_COMM_WORLD);

  //allocating received_array
  received_array = malloc((N+2) * sizeof(double *));
  int i;
  for (i=0; i<N+2; i++) {
    received_array[i] = malloc((N+2) * sizeof(double));
  }

  if(MyProc == 0){
    A = malloc((N+2) * sizeof(double *));
    int i;
    for (i=0; i<N+2; i++) {
      A[i] = malloc((N+2) * sizeof(double));
    }
    initialize(A);
    stencil(A);
    showArray(A);
    //printf("sizeof: %d\n",sizeof(A)/sizeof(double));

    MPI_Send(A, ARRAY_SIZE, MPI_DOUBLE, MyProc +1,tag, MPI_COMM_WORLD);
    printf("Proc #%d enviando a #%d\n",MyProc,MyProc+1 );
  }

  if(MyProc > 0 && MyProc < size -1){
    MPI_Recv(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc- 1, tag, MPI_COMM_WORLD, &status);

    printf("Proc #%d recibe de Proc #%d\n",MyProc,MyProc- 1 );
    //stencil(A);
    printf("Proc #%d enviando a #%d\n",MyProc,MyProc+1 );
    MPI_Send(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc +1,tag, MPI_COMM_WORLD);
  }

  if(MyProc == size -1 ){
    MPI_Recv(received_array, ARRAY_SIZE, MPI_DOUBLE, MyProc- 1, tag, MPI_COMM_WORLD, &status);
    printf("Proc #%d recibe de Proc #%d\n",MyProc,MyProc- 1 );
    //stencil(A);
  }

  printf("Finishing proc %d\n", MyProc);
  MPI_Barrier(MPI_COMM_WORLD);
  MPI_Finalize();

}

我收到此错误

[compute-0-4.local:30784] *** An error occurred in MPI_Recv
[compute-0-4.local:30784] *** on communicator MPI_COMM_WORLD
[compute-0-4.local:30784] *** MPI_ERR_BUFFER: invalid buffer pointer
[compute-0-4.local:30784] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[compute-0-4.local][[28950,1],0][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 30784 on
node compute-0-4.local exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[compute-0-4.local:30782] 1 more process has sent help message help-mpi-errors.txt / mpi_errors_are_fatal
[compute-0-4.local:30782] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

现在为received_array分配内存我收到以下错误消息:

[compute-0-0:18176] *** Process received signal ***
[compute-0-0:18177] *** Process received signal ***
[compute-0-0:18177] Signal: Segmentation fault (11)
[compute-0-0:18177] Signal code:  (128)
[compute-0-0:18177] Failing at address: (nil)
[compute-0-0:18176] Signal: Segmentation fault (11)
[compute-0-0:18176] Signal code: Address not mapped (1)
[compute-0-0:18176] Failing at address: 0x10
[compute-0-0:18176] [ 0] /lib64/libpthread.so.0() [0x326fa0f500]
[compute-0-0:18176] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0xae) [0x2b22bf88211e]
[compute-0-0:18176] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x57) [0x2b22bf883b87]
[compute-0-0:18176] [ 3] /opt/openmpi/lib/libmpi.so.1(+0x2258f7) [0x2b22bf88b8f7]
[compute-0-0:18176] [ 4] /opt/openmpi/lib/libmpi.so.1(mca_base_param_reg_int_name+0x3f) [0x2b22bf88bd9f]
[compute-0-0:18176] [ 5] /opt/openmpi/lib/libmpi.so.1(ompi_mpi_finalize+0x126) [0x2b22bf6f5fb6]
[compute-0-0:18176] [ 6] ./ej7(main+0x2d2) [0x4010e8]
[compute-0-0:18176] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x326f21ecdd]
[compute-0-0:18176] [ 8] ./ej7() [0x400ac9]
[compute-0-0:18176] *** End of error message ***
[compute-0-0:18177] [ 0] /lib64/libpthread.so.0() [0x326fa0f500]
[compute-0-0:18177] [ 1] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0xae) [0x2b52f96ff11e]
[compute-0-0:18177] [ 2] /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_malloc+0x57) [0x2b52f9700b87]
[compute-0-0:18177] [ 3] /opt/openmpi/lib/libmpi.so.1(+0x2258f7) [0x2b52f97088f7]
[compute-0-0:18177] [ 4] /opt/openmpi/lib/libmpi.so.1(mca_base_param_reg_int_name+0x3f) [0x2b52f9708d9f]
[compute-0-0:18177] [ 5] /opt/openmpi/lib/libmpi.so.1(ompi_mpi_finalize+0x126) [0x2b52f9572fb6]
[compute-0-0:18177] [ 6] ./ej7(main+0x2d2) [0x4010e8]
[compute-0-0:18177] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd) [0x326f21ecdd]
[compute-0-0:18177] [ 8] ./ej7() [0x400ac9]
[compute-0-0:18177] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 18176 on node compute-0-0.local exited on signal 11 (Segmentation fault).

1 个答案:

答案 0 :(得分:1)

以与分配received_array相似的方式分配A

即使您传递数组,MPI也不会分配内存。

然后,编辑后的问题是您正在传输一个方阵,该方阵通过一个MPI发送而不是使用N+2调用(每行一个)来分配作为指针的指针。而且这不起作用,因为MPI_Send / MPI_Recv所做的是发送ARRAY_SIZE个连续元素...

在HPC中,我们直接使用ARRAY_SIZE的1D数组,然后使用宏(例如)来获取2D访问,因为它快速,易于缓存并且不需要{{1} }(而不是延迟)呼叫。