MPI:取消非阻塞发送

时间:2010-12-14 11:58:50

标签: c parallel-processing mpi

我正在使用Open MPI库来实现以下算法:我们有两个进程p1p2。他们都在执行一些迭代,并且在每次迭代结束时,他们都会传达他们的结果。问题是执行不一定是平衡的,所以p1可以在p2执行的时间内执行10次迭代1.尽管如此,我希望p2读取最后一次的最新结果由p1执行的迭代。

因此,我的想法是p1在每次迭代时发送其结果。但是,在从迭代i发送结果之前,它应检查p2是否实际读取了迭代i-1中的信息。如果没有,它应该取消之前的发送,这样当p2p1读取时,它会读取最新的结果。

不幸的是,我不知道该怎么做。我尝试过使用MPI_Cancel,如下面的代码所示:

int main (int argc, char *argv[]){

    int myrank, numprocs;
    MPI_Status status;
    MPI_Request request;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

    if(myrank == 0){
        int send_buf = 1, flag;
        MPI_Isend(&send_buf, 1, MPI_INT, 1, 123, MPI_COMM_WORLD, 
                  &request);
        MPI_Cancel(&request);
        MPI_Wait(&request, &status);
        MPI_Test_cancelled(&status, &flag);
        if (flag) printf("Send cancelled\n");
        else printf("Send NOT cancelled\n");
        send_buf = 2;
        MPI_Isend(&send_buf, 1, MPI_INT, 1, 123, MPI_COMM_WORLD, 
                  &request);
    }
    else {
        sleep(5);
        int msg;
        MPI_Recv(&msg, 1, MPI_INT, 0, 123,
                 MPI_COMM_WORLD, &status);
        printf("%d\n", msg);
    }
    MPI_Finalize();

    return 0;
}

但是当我执行时,它表示发送无法取消,p2打印1而不是2。

我想知道是否有任何方法可以实现我提议的内容,或者是否有替代方法可以对p1p2之间的行为进行编码。

3 个答案:

答案 0 :(得分:5)

我会改变对通讯的控制。 p1应该发出信号表明它已准备好接收消息,而p2将仅发送消息,而不是p1发送必须取消的不必要消息。与此同时,p1只是用最新结果覆盖其发送缓冲区。

在(未经测试的)代码中:

if ( rank == 0 )
{
    int ready;
    MPI_Request p2_request;
    MPI_Status p2_status;
    // initial request
    MPI_Irecv(&ready, 1, MPI_INT, 1, 123, MPI_COMM_WORLD, &p2_request);
    for (int i=0; true; i++)
    {
        sleep(1);
        MPI_Test(&p2_request, &ready, &p2_status);
        if ( ready )
        {
            // blocking send: p2 is ready to receive
            MPI_Send(&i, 1, MPI_INT, 1, 123, MPI_COMM_WORLD);
            // post new request
            MPI_Irecv(&ready, 1, MPI_INT, 1, 123, MPI_COMM_WORLD, &p2_request);
        }
    }
}
else
{
    int msg;
    MPI_Status status;
    while (true)
    {
        sleep(5);
        // actual message content doesn't matter, just let p1 know we're ready
        MPI_Send(&msg, 1, MPI_INT, 0, 123, MPI_COMM_WORLD);
        // receive message
        MPI_Recv(&msg, 1, MPI_INT, 0, 123, MPI_COMM_WORLD, &status);
    }
}

就像我说的那样,这是未经测试的代码,但你可能会看到我在那里得到的东西。只有当出现严重错误时才应使用MPI_Cancel:在正常执行期间不应取消任何消息。

答案 1 :(得分:5)

另一种方法完全是使用MPI单向通信(例如,http://www.linux-mag.com/id/1793)。但请注意,执行被动通信,这是你真正想要的,这是相当棘手的(虽然成对,mpi_win_post和mpi_win_start更容易)并且单方面的东西有望在MPI-3中全部改变,所以我不喜欢我知道我建议你走多远的路。

与你在这里首次尝试的内容更直接相关,而不是取消消息(如上所述,这非常激烈),通过所有排队的消息可能要容易得多(MPI保证消息不会超过彼此 - 唯一的警告是,如果您正在使用MPI_THREAD_MULTIPLE并且在一个MPI任务中发送多个线程,在这种情况下顺序定义了顺序):

#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>

void compute() {
    const int maxusecs=500;
    unsigned long sleepytime=(unsigned long)round(((float)rand()/RAND_MAX)*maxusecs);

    usleep(sleepytime);
}

int main(int argc, char** argv)
{
  int rank, size, i;
  int otherrank;
  const int niters=10;
  const int tag=5;
  double newval;
  double sentvals[niters+1];
  double othernewval;
  MPI_Request reqs[niters+1];
  MPI_Status stat;
  int ready;

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
  if (size != 2) {
     fprintf(stderr,"This assumes 2 processes\n");
     MPI_Finalize();
     exit(-1);
  }

  otherrank = (rank == 0 ? 1 : 0);
  srand(rank);

  compute();
  newval = rank * 100. + 0;
  sentvals[0] = newval;
  MPI_Isend(&(sentvals[0]), 1, MPI_DOUBLE, otherrank, tag, MPI_COMM_WORLD, &(reqs[0]));
  MPI_Recv (&othernewval,   1, MPI_DOUBLE, otherrank, tag, MPI_COMM_WORLD, &stat);
  for (i=0; i<niters; i++) {

      MPI_Iprobe(otherrank, tag, MPI_COMM_WORLD, &ready, &stat);
      while (ready) {
          MPI_Recv(&othernewval, 1, MPI_DOUBLE, otherrank, tag, MPI_COMM_WORLD, &stat);
          printf("%s[%d]: Reading queued data %lf:\n",
                  (rank == 0 ? "" : "\t\t\t\t"), rank, othernewval);
          MPI_Iprobe(otherrank, tag, MPI_COMM_WORLD, &ready, &stat);
      }

      printf("%s[%d]: Got data %lf, computing:\n", 
              (rank == 0 ? "" : "\t\t\t\t"), rank, othernewval);
      compute();

      /* update my data */ 
      newval = rank * 100. + i + 1;
      printf("%s[%d]: computed %lf, sending:\n", 
              (rank == 0 ? "" : "\t\t\t\t"), rank, newval);
      sentvals[i+1] = newval;
      MPI_Isend(&(sentvals[i+1]), 1, MPI_DOUBLE, otherrank, tag, MPI_COMM_WORLD, &(reqs[0]));
   }


  MPI_Finalize();

  return 0;
}

运行它会给你(注意,因为发送数据并不意味着它在打印时收到):

[0]: Got data 100.000000, computing:
                                [1]: Got data 0.000000, computing:
[0]: computed 1.000000, sending:
[0]: Got data 100.000000, computing:
                                [1]: computed 101.000000, sending:
                                [1]: Got data 0.000000, computing:
[0]: computed 2.000000, sending:
[0]: Got data 100.000000, computing:
                                [1]: computed 102.000000, sending:
                                [1]: Reading queued data 1.000000:
                                [1]: Got data 1.000000, computing:
[0]: computed 3.000000, sending:
[0]: Reading queued data 101.000000:
[0]: Got data 101.000000, computing:
                                [1]: computed 103.000000, sending:
                                [1]: Reading queued data 2.000000:
                                [1]: Got data 2.000000, computing:
[0]: computed 4.000000, sending:
                                [1]: computed 104.000000, sending:
[0]: Reading queued data 102.000000:
                                [1]: Reading queued data 3.000000:
                                [1]: Got data 3.000000, computing:
[0]: Got data 102.000000, computing:
[0]: computed 5.000000, sending:
[0]: Reading queued data 103.000000:
[0]: Got data 103.000000, computing:
                                [1]: computed 105.000000, sending:
                                [1]: Reading queued data 4.000000:
                                [1]: Got data 4.000000, computing:
[0]: computed 6.000000, sending:
[0]: Reading queued data 104.000000:
[0]: Got data 104.000000, computing:
                                [1]: computed 106.000000, sending:
                                [1]: Reading queued data 5.000000:
                                [1]: Got data 5.000000, computing:
[0]: computed 7.000000, sending:
[0]: Reading queued data 105.000000:
[0]: Got data 105.000000, computing:
                                [1]: computed 107.000000, sending:
                                [1]: Reading queued data 6.000000:
                                [1]: Got data 6.000000, computing:
[0]: computed 8.000000, sending:
[0]: Reading queued data 106.000000:
[0]: Got data 106.000000, computing:
                                [1]: computed 108.000000, sending:
                                [1]: Reading queued data 7.000000:
                                [1]: Got data 7.000000, computing:
[0]: computed 9.000000, sending:
[0]: Reading queued data 107.000000:
[0]: Got data 107.000000, computing:
                                [1]: computed 109.000000, sending:
                                [1]: Reading queued data 8.000000:
                                [1]: Got data 8.000000, computing:
[0]: computed 10.000000, sending:
                                [1]: computed 110.000000, sending:

请注意,这只是演示代码,最终版本真的需要在那里进行waitalls和更多iprobes以释放任何待处理的请求并刷新任何等待的消息。

答案 2 :(得分:0)

您的环境和MPI分发是否支持多线程?如果是这样,您可以在P1中创建一个计算值的线程,并将每个迭代的结果存储在与P1的主线程共享的变量中(通过信号量保护写入) 正如上面的suszterpatt所建议的那样,让P2向P1发送一条“我准备好”的消息,并让P1用最近一次迭代的值作出响应。