MPI在执行期间挂起

时间:2013-05-26 09:38:32

标签: debugging mpi hang

我正在尝试用MPI写一个简单的程序,找到所有小于514的数字,它们等于数字总和的指数(例如,512 =(5 + 1 + 2)^ 3)。我遇到的问题是主循环 - 它在几次迭代(c = 10)上运行得很好,但是当我尝试增加迭代次数(c = x)时,mpiexec.exe只是挂起 - 看似在中间printf例程。

我很确定应该责备死锁,但我找不到任何死锁。

源代码:

#include <stdlib.h>
#include <stdio.h>
#include <iostream>
#include "mpi.h"

int main(int argc, char* argv[])
{
    //our number
    int x=514;
    //amount of iterations
    int c = 10;
    //tags for message identification
    int tag = 42;
    int tagnumber = 43;
    int np, me, y1, y2;
    MPI_Status status;

    /* Initialize MPI */
    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &me);
    /* Check that we run on more than two processors */
    if (np < 2)
    {
        printf("You have to use at least 2 processes to run this program\n");
        MPI_Finalize();
        exit(0);
    }
    //begin iterations
    while(c>0)
    {
        //if main thread, then send messages to all created threads
        if (me == 0)
        { 
            printf("Amount of threads: %d\n", np);
            int b = 1;
            while(b<np)
            {
                int q = x-b;
                //sends a number to a secondary thread
                MPI_Send(&q, 1, MPI_INT, b, tagnumber, MPI_COMM_WORLD);
                printf("Process %d sending to process %d, value: %d\n", me, b, q);
                //get a number from secondary thread
                MPI_Recv(&y2, 1, MPI_INT, b, tag, MPI_COMM_WORLD, &status);
                printf ("Process %d received value %d\n", me, y2);
                //compare it with the sent one
                if (q==y2)
                {
                    //if they're equal, then print the result
                    printf("\nValue found: %d\n", q);
                }
                b++;
            }
            x = x-b+1;
            b = 1;
        }
        else
        {
            //if not a main thread, then process the message sent and send the result back.
            MPI_Recv (&y1, 1, MPI_INT, 0, tagnumber, MPI_COMM_WORLD, &status);
            int sum = 0;
            int y2 = y1;
            while (y1!=0)
            {
                //find the number's sum of digits
                sum += y1%10;
                y1 /= 10;
            }
            int sum2 = sum;
            while(sum2<y2)
            {
                //calculate the exponentiation
                sum2 = sum2*sum;
            }
            MPI_Send (&sum2, 1, MPI_INT, 0, tag, MPI_COMM_WORLD);
        }
        c--;
    }
    MPI_Finalize();
    exit(0);
}

我将编译后的exe文件作为“mpiexec.exe -n 4 lab2.exe”运行。我使用的是HPC Pack 2008 SDK,如果这对你们有用的话。

有什么办法可以解决吗?或者可能是某种方式来正确调试这种情况?

提前多多感谢!

1 个答案:

答案 0 :(得分:1)

不确定你是否已经找到问题所在,但是你的无限运行发生在这个循环中:

while(sum2<y2)
{
    //calculate the exponentiation
    sum2 = sum2*sum;
}

您可以将c设置为大约300或更高,然后在此while循环中进行printf调用,以确认这一点。我还没有完全找出你的逻辑错误,但我在你的代码位置标记了三条评论,我觉得很奇怪:

while(c>0)
{
    if (me == 0)
    { 
        ...
        while(b<np)
        {
            int q = x-b; //<-- you subtract b from x here
            ...
            b++;
        }
        x = x-b+1; //<-- you subtract b again. sure this is what you want?
        b = 1; //<-- this is useless
    }

希望这有帮助。