程序卡在某处

时间:2015-03-05 18:37:48

标签: parallel-processing mpi openmpi

我正在尝试发送工具主从模式,其中master具有一个数组(充当作业队列)并将数据发送到从处理器。根据从主站获得的数据,从站计算结果并将答案返回给主站。掌握接收结果,找出接收到msg的从属等级,然后将下一个作业发送给该从属。

这是我实施的代码框架:

        if (my_rank != 0) 
        {
            MPI_Recv(&seed, 1, MPI_FLOAT, 0, tag, MPI_COMM_WORLD, &status);

                    //.. some processing 

            MPI_Send(&message, 100, MPI_FLOAT, 0, my_rank, MPI_COMM_WORLD);
        } 
        else 
        {
            for (i = 1; i < p; i++) {
                MPI_Send(&A[i], 1, MPI_FLOAT, i, tag, MPI_COMM_WORLD);
            }

            for (i = p; i <= S; i++) {
                MPI_Recv(&buf, 100, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG,
                        MPI_COMM_WORLD, &status);
                //.. processing to find out free slave rank from which above msg was received (y)
                MPI_Send(&A[i], 1, MPI_FLOAT, y, tag, MPI_COMM_WORLD);
            }

            for (i = 1; i < p; i++) {
                MPI_Recv(&buf, 100, MPI_FLOAT, MPI_ANY_SOURCE, MPI_ANY_TAG,MPI_COMM_WORLD, &status);

                // .. more processing 
            }

        }

如果我使用的是4处理器; 1是主人,3是奴隶;程序发送和接收作业队列中前3个作业的消息,但之后程序挂起。可能是什么问题呢?

1 个答案:

答案 0 :(得分:0)

如果这是基于MPI的代码的总体,那么看起来您在客户端代码的外部缺少while循环。我以前做过这个,我通常将其分解为taskMaster和peons

在taskMaster中

 for (int i = 0; i < commSize; ++i){
    if (i == commRank){ // commRank doesn't have to be 0
        continue;
    }

    if (taskNum < taskCount){
        // tasks is vector<Task>, where I have crated a Task 
        // class and send it as a stream of bytes
        toSend = tasks.at(taskNum);  
        jobList.at(i) = taskNum;  // so we no which rank has which task
        taskNum += 1;
        activePeons += 1;
    } else{
        // stopTask is a flag value to stop receiving peon
        toSend = stopTask;
        allTasksDistributed = true;
    }

    // send the task, with the size of the task as the tag
    taskSize = sizeof(toSend);
    MPI_Send(&toSend, taskSize, MPI_CHAR, i, taskSize, MPI_COMM_WORLD);
}   

MPI_Status status;

while (activePeons > 0){ 
    // get the results from a peon (but figure out who it is coming from and what the size is)
    MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
    MPI_Recv(   &toSend,                    // receive the incoming task (with result data)
                status.MPI_TAG,             // Tag holds number of bytes
                MPI_CHAR,                   // type, may need to be more robust later
                status.MPI_SOURCE,          // source of send
                MPI_ANY_TAG,                // tag
                MPI_COMM_WORLD,             // COMM
                &status);                   // status

    // put the result from that task into the results vector
    results[jobList[status.MPI_SOURCE]] = toSend.getResult();

    // if there are more tasks to send, distribute the next one
    if (taskNum < taskCount ){
        toSend = tasks.at(taskNum);
        jobList[status.MPI_SOURCE] = taskNum;
        taskNum += 1;
    } else{ // otherwise send the stop task and decrement activePeons
        toSend = stopTask;
        activePeons -= 1;
    }

    // send the task, with the size of the task as the tag
    taskSize = sizeof(toSend);
    MPI_Send(&toSend, taskSize, MPI_CHAR, status.MPI_SOURCE, taskSize, MPI_COMM_WORLD);
}

在peon函数中:

while (running){
    MPI_Probe(MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);    // tag holds data size

    incoming = (Task *) malloc(status.MPI_TAG);

    MPI_Recv(   incoming,           // memory location of input
                status.MPI_TAG,     // tag holds data size
                MPI_CHAR,           // type of data
                status.MPI_SOURCE,  // source is from distributor
                MPI_ANY_TAG,        // tag
                MPI_COMM_WORLD,     // comm
                &status);           // status

    task = Task(*incoming);

    if (task.getFlags() == STOP_FLAG){
        running = false;
        continue;
    }

    task.run();   // my task class has a "run" method
    MPI_Send(   &task,                  // string to send back
                status.MPI_TAG,         // size in = size out
                MPI_CHAR,               // data type
                status.MPI_SOURCE,      // destination
                status.MPI_TAG,         // tag doesn't matter
                MPI_COMM_WORLD);        // comm

    free(incoming);
}

有一些boolint值必须分配(正如我所说,我有一个Task类),但这给出了我认为你想做的基本结构