Question

在我的程序中，我需要使用MPI进行一些矩阵乘法。当我运行我的程序时，我收到以下错误：

=====================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
=====================================================================================
APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)

执行：

 printf("Sent a\n");

错误在：

 MPI_Send(&b, nColA*nColB, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);

它不会执行：

 printf("Sent b\n");

我现在不知道为什么。

你能帮助我吗？

void multiplicaMatriz (int taskid, int numtasks, float **a, float **b, float **c, long int nLinA, long int nColA, long int nLinB, long int nColB)
{
    long int    i, j, k, rc;           /* misc */

    int numworkers,        /* number of worker tasks */
    source,                /* task id of message source */
    dest,                  /* task id of message destination */
    mtype,                 /* message type */
    rows,                  /* rows of matrix A sent to each worker */
    averow, extra, offset; /* used to determine rows sent to each worker */

    MPI_Status status;

    numworkers = numtasks-1;


   /**************************** master task ************************************/
   if (taskid == MASTER)
   {
      printf("mpi_mm has started with %d tasks.\n",numtasks);

      /* Send matrix data to the worker tasks */
      averow = nLinA/numworkers;
      extra = nLinA%numworkers;
      offset = 0;
      mtype = FROM_MASTER;
      for (dest=1; dest<=numworkers; dest++)
      {
         rows = (dest <= extra) ? averow+1 : averow;    
         printf("Sending %d rows to task %d offset=%d\n",rows,dest,offset);
         MPI_Send(&offset, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
         printf("Sent offset %d\n", offset);
         MPI_Send(&rows, 1, MPI_INT, dest, mtype, MPI_COMM_WORLD);
         printf("Sent rows %d\n", rows);
         MPI_Send(&a[offset][0], rows*nColA, MPI_FLOAT, dest, mtype,
                   MPI_COMM_WORLD);
         printf("Sent a\n");          
         MPI_Send(&b, nColA*nColB, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);
         printf("Sent b\n");
         offset = offset + rows;
      }

      /* Receive results from worker tasks */
      mtype = FROM_WORKER;
      for (i=1; i<=numworkers; i++)
      {
         source = i;
         MPI_Recv(&offset, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
         MPI_Recv(&rows, 1, MPI_INT, source, mtype, MPI_COMM_WORLD, &status);
         MPI_Recv(&c[offset][0], rows*nColB, MPI_FLOAT, source, mtype, 
                  MPI_COMM_WORLD, &status);
         printf("Received results from task %d\n",source);
      }

      /* Print results */
      printf("******************************************************\n");
      printf("Result Matrix:\n");
      for (i=0; i<nLinA; i++)
      {
         printf("\n"); 
         for (j=0; j<nColB; j++) 
            printf("%6.2f   ", c[i][j]);
      }
      printf("\n******************************************************\n");
      printf ("Done.\n");
   }


   /**************************** worker task ************************************/
   if (taskid > MASTER)
   {
      mtype = FROM_MASTER;
      MPI_Recv(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
      MPI_Recv(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD, &status);
      MPI_Recv(&a, rows*nColA, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);
      MPI_Recv(&b, nColA*nColB, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD, &status);

      for (k=0; k<nColB; k++)
         for (i=0; i<rows; i++)
         {
            c[i][k] = 0.0;
            for (j=0; j<nColA; j++)
               c[i][k] = c[i][k] + a[i][j] * b[j][k];
         }
      mtype = FROM_WORKER;
      MPI_Send(&offset, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&rows, 1, MPI_INT, MASTER, mtype, MPI_COMM_WORLD);
      MPI_Send(&c, rows*nColB, MPI_FLOAT, MASTER, mtype, MPI_COMM_WORLD);
   }
 }

Answer 1

这是由于错误地访问b。

仔细阅读本声明：

int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm);

如果查看buf参数，则void*会被转换为datatype类型。当您致电MPI_Send(&b, nColA*nColB, MPI_FLOAT, dest, mtype, MPI_COMM_WORLD);时，您正在通过&b。这是对b的引用，其类型为float***。该函数将其视为类型float*，导致错误。

在对MPI_Send()的其他来电中，您传递了&a[offset][0]，其中的float*类型正确。尝试传递&b[offset][0]，或者您需要订购这些数组索引以使乘法正确。

我不打算为你找出这些指数，这是你的工作。但这就是导致段错误的原因。

分段错误和MPI

1 个答案: