Question

我正在使用MPI_Type_create_struct()发送一个（粒子）结构，例如here，或详细解释here。我正在收集所有进入特定过程的粒子，memcpy()它们进入发送缓冲区并MPI_Isend()它们。

到目前为止，这么好。 MPI_Iprob()为消息提供了正确的粒子数量。所以我MPI_Recv()缓冲区并提取数据（现在甚至通过逐个复制结构）。无论我发送多少粒子，只有第一粒子的数据是正确的。

有三种可能的错误：

由于我在第一个链接中使用MPI_Type_create_struct()，offset of()无法创建我的结构的正确地图。也许我的struct包含一个不可见的填充，如第二个链接中所述。
我正在做一些简单的错误，同时将粒子复制到发送缓冲区和接收缓冲区后面（我打印发送缓冲区 - 它可以工作 - 但也许我忽略了一些东西）
完全不同的东西。

（对于代码的真实丑陋的呈现感到抱歉，我无法以下降的方式呈现它。你会在Github上找到代码here - 该行已标记 - 也是！）

以下是mpi数据类型

typedef struct {
        int                 ID;
        double              x[DIM];
} pchase_particle_t;

 const int           items = 2;
 int                 block_lengths[2] = {1, DIM};
 MPI_Datatype        mpi_types[2] = {MPI_INT, MPI_DOUBLE};
 MPI_Aint            offsets[2];
 offsets[0] = offsetof(pchase_particle_t, ID);
 offsets[1] = offsetof(pchase_particle_t, x);
 MPI_Type_create_struct(items, block_lengths, offsets, mpi_types, &W->MPI_Particle);
 MPI_Type_commit(&W->MPI_Particle);

发送

/* handle all mpi send/recv status data */
MPI_Request        *send_request = P4EST_ALLOC(MPI_Request, W->p4est->mpisize);
MPI_Status         *recv_status = P4EST_ALLOC(MPI_Status, W->p4est->mpisize);
/* setup send/recv buffers */
pchase_particle_t **recv_buf = P4EST_ALLOC(pchase_particle_t *, num_senders);
pchase_particle_t **send_buf = P4EST_ALLOC(pchase_particle_t *, num_receivers);
int                 recv_count = 0, recv_length, flag, j;

/* send all particles to their belonging procs */
for (i = 0; i < num_receivers; i++) {
  /* resolve particle list for proc i */
  sc_list_t          *tmpList = *((sc_list_t **) sc_array_index(W->particles_to, receivers[i]));
  pchase_particle_t * tmpParticle;
  int                 send_count = 0;

  /* get space for the particles to be sent */
  send_buf[i] = P4EST_ALLOC(pchase_particle_t, tmpList->elem_count);

  /* copy all particles into the send buffer and remove them from this proc */
  while(tmpList->first != NULL){
    tmpParticle = sc_list_pop(tmpList);
    memcpy(send_buf[i] + send_count * sizeof(pchase_particle_t), tmpParticle, sizeof(pchase_particle_t));
    /* free particle */
    P4EST_FREE(tmpParticle);
    /* update particle counter */ 
    send_count++;
  }

  /* print send buffer */
  for (j = 0; j < send_count; j++) {
    pchase_particle_t  *tmpParticle = send_buf[i] + j * sizeof(pchase_particle_t);
    printf("[pchase %i sending] particle[%i](%lf,%lf)\n", W->p4est->mpirank, tmpParticle->ID, tmpParticle->x[0], tmpParticle->x[1]);
  }

  printf("[pchase %i sending] particle count: %i\n", W->p4est->mpirank, send_count);
  /* send particles to right owner */
  mpiret = MPI_Isend(send_buf[i], send_count, W->MPI_Particle, receivers[i], 13, W->p4est->mpicomm, &send_request[i]);
  SC_CHECK_MPI(mpiret);
}

和接收。

recv_count = 0;
/* check for messages until all arrived */
while (recv_count < num_senders) {
  /* probe if any of the sender has already sent his message */
  for (i = 0; i < num_senders; i++) {
    MPI_Iprobe(senders[i], MPI_ANY_TAG, W->p4est->mpicomm,
        &flag, &recv_status[i]);
    if (flag) {
      /* resolve number of particles receiving */
      MPI_Get_count(&recv_status[i], W->MPI_Particle, &recv_length);
      printf("[pchase %i receiving message] %i particles arrived from sender %i with tag %i\n",
          W->p4est->mpirank, recv_length, recv_status[i].MPI_SOURCE, recv_status[i].MPI_TAG);
      /* get space for the particles to be sent */
      recv_buf[recv_count] = P4EST_ALLOC(pchase_particle_t, recv_length);
      /* receive a list with recv_length particles */ 
      mpiret = MPI_Recv(recv_buf[recv_count], recv_length, W->MPI_Particle, recv_status[i].MPI_SOURCE,
          recv_status[i].MPI_TAG, W->p4est->mpicomm, &recv_status[i]);
      SC_CHECK_MPI(mpiret);

      /*
       * insert all received particles into the
       * push list
       */
      pchase_particle_t  *tmpParticle;
      for (j = 0; j < recv_length; j++) {
        /*
         * retrieve all particle details from
         * recv_buf
         */
        tmpParticle = recv_buf[recv_count] + j * sizeof(pchase_particle_t);
        pchase_particle_t *addParticle = P4EST_ALLOC(pchase_particle_t,1);
        addParticle->ID=tmpParticle->ID;
        addParticle->x[0] = tmpParticle->x[0];
        addParticle->x[1] = tmpParticle->x[1];

        printf("[pchase %i receiving] particle[%i](%lf,%lf)\n",
            W->p4est->mpirank, addParticle->ID, addParticle->x[0], addParticle->x[1]);
        /* push received particle to push list and update world counter */
        sc_list_append(W->particle_push_list, addParticle);
        W->n_particles++;
      }
      /* we received another particle list */
      recv_count++;
    }
  }
}

编辑：重新缩进.. 编辑：只有第一个粒子的数据是正确的，意味着它的所有属性（ID和坐标）都与发送粒子的属性相同。然而，其他用零填充，即ID = 0，x [0] = 0.0，x [1] = 0.0。也许这是解决方案的暗示。

Answer 1

指针算术中存在错误。 send_buf[i]已经是pchase_particle_t *类型，因此send_buf[i] + j * sizeof(pchase_particle_t)不指向j - 缓冲区的i - 元素，而是指向j * sizeof(pchase_particle_t) 1}} - 元素。因此，您的粒子不会连续存储在内存中，而是由sizeof(pchase_particle_t) - 1个空数组元素分隔。这些被发送而不是正确的粒子，因为MPI_Send调用连续访问缓冲区内存。这同样适用于接收器的代码。

您没有在发件人代码中看到错误，因为您的调试打印使用相同的错误指针算法，因此使用相同的步幅访问内存。我猜你的发送计数很小，你在数据段堆上分配内存，否则你应该在数据打包过程的早期就已经收到SIGSEGV的越界数组访问（例如在{{1部分）。

解决方案：不要将数组索引乘以memcpy。

通过MPI发送c结构部分失败

1 个答案: