MPI中的奇怪分段错误

时间:2012-08-31 09:24:17

标签: c++ segmentation-fault mpi

我编写了一个简单的MPI程序,用于练习MPI用户定义的数据类型函数。以下是抛出段错误的版本。

    #include <mpi.h>
    #include <iostream>

    using namespace std;

    int main( int argc , char ** argv )
    {
        int rank;

        MPI_Datatype newtype;
        MPI_Datatype newertype;

        MPI_Init(&argc,&argv);

        MPI_Comm_rank(MPI_COMM_WORLD,&rank);

        MPI_Type_contiguous(2,MPI_INT,&newtype);
        MPI_Type_commit(&newtype);
        MPI_Type_vector(3,2,3,newtype,&newertype);
        MPI_Type_commit(&newertype);    



        int * buffer = new int[16];

        for( int i=0 ; i<16 ; i++ )
        {
            buffer[i] = 0;
        }

        if(rank==0)
        {
            for( int i=0 ; i<16 ; i++ )
            {
                buffer[i] = 9;
            }

            MPI_Send(buffer,3,newertype,1,0,MPI_COMM_WORLD);        

        }else if(rank==1)
        {
            MPI_Recv(buffer,3,newertype,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

            for( int i=0 ; i<16 ; i++ )
            {
                cout << buffer[i] << " ";
            }

            cout << endl;

        }

        MPI_Type_free(&newertype);
        MPI_Type_free(&newtype);

        MPI_Finalize();

        return 0;
    }

但是,当在MPI_Init之前移动数组声明时,一切正常。

#include <mpi.h>
#include <iostream>

using namespace std;

int main( int argc , char ** argv )
{
    int rank;

    **int * buffer = new int[16];

    for( int i=0 ; i<16 ; i++ )
    {
            buffer[i] = 0;
    }**

    MPI_Datatype newtype;
    MPI_Datatype newertype;

    MPI_Init(&argc,&argv);

    MPI_Comm_rank(MPI_COMM_WORLD,&rank);

    MPI_Type_contiguous(2,MPI_INT,&newtype);
    MPI_Type_commit(&newtype);
    MPI_Type_vector(3,2,3,newtype,&newertype);
    MPI_Type_commit(&newertype);    

    if(rank==0)
    {
        for( int i=0 ; i<16 ; i++ )
        {
            buffer[i] = 9;
        }

        MPI_Send(buffer,3,newertype,1,0,MPI_COMM_WORLD);        

    }else if(rank==1)
    {
        MPI_Recv(buffer,3,newertype,0,0,MPI_COMM_WORLD,MPI_STATUS_IGNORE);

        for( int i=0 ; i<16 ; i++ )
        {
            cout << buffer[i] << " ";
        }

        cout << endl;

    }

    MPI_Type_free(&newertype);
    MPI_Type_free(&newtype);

    MPI_Finalize();

    return 0;
}

有人可以解释在MPI_Init调用后声明数组有什么问题吗?

For your information, below is the error message

9 9 9 9 0 0 9 9 9 9 0 0 9 9 9 9 
[linuxscc003:10019] *** Process received signal ***
[linuxscc003:10019] Signal: Segmentation fault (11)
[linuxscc003:10019] Signal code: Address not mapped (1)
[linuxscc003:10019] Failing at address: 0x7fa00d0b36c8 
[linuxscc003:10019] [ 0] /lib64/libpthread.so.0() [0x3abf80f500]
[linuxscc003:10019] [ 1] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x299) [0x7f980ce46509]
[linuxscc003:10019] [ 2] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0xe7b2b) [0x7f980ce46b2b]                            
[linuxscc003:10019] [ 3] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0xf0a60) [0x7f980ce4fa60]                            
[linuxscc003:10019] [ 4] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(mca_base_param_finalize+0x41) [0x7f980ce4f731]        
[linuxscc003:10019] [ 5] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(opal_finalize_util+0x1b) [0x7f980ce3f53b]             
[linuxscc003:10019] [ 6] /opt/MPI/openmpi-1.5.3/linux/gcc/lib/libmpi.so.1(+0x4ce35) [0x7f980cdabe35]                            
[linuxscc003:10019] [ 7] type_contiguous(main+0x1aa) [0x408f2e]                                                                 
[linuxscc003:10019] [ 8] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3abec1ecdd]                                                
[linuxscc003:10019] [ 9] type_contiguous() [0x408cc9]                                                                           
[linuxscc003:10019] *** End of error message ***                                                                                
--------------------------------------------------------------------------                                                      
mpiexec noticed that process rank 1 with PID 10019 on node linuxscc003 exited on signal 11 (Segmentation fault).                
--------------------------------------------------------------------------                                                      
Failure executing command /opt/MPI/openmpi-1.5.3/linux/gcc/bin/mpiexec -x  LD_LIBRARY_PATH -x  PATH -x  OMP_NUM_THREADS -x  MPI_NAME --hostfile /tmp/hostfile-9252 -np 2 type_contiguous                     

1 个答案:

答案 0 :(得分:3)

newertype有3个段,由2个newtype元素组成,步长为3.您发送的是该类型的3个元素。这意味着从发送或接收操作期间访问的第一个元素到最后一个元素的内存中的跨度是3*3*3 - 1(3个元素,每个元素包含3个元素的3个元素,减去1,因为你只从3中获取2个元素对于最后一段)或类型newtype的26个元素。每个newtype是两个连续的MPI_INT元素。您的发送或接收缓冲区应至少为52整数,但您只分配16,因此排名1中的MPI_Recv正在写入已分配缓冲区的末尾,可能会覆盖堆控制结构。在调用MPI_Init之前移动分配会更改内存中这些结构的顺序,而您的代码现在会覆盖不同但不重要的内容。代码仍然不正确,你很幸运,它不是段错误。使用更大的缓冲区(至少52个元素)。