OpenMPI计算机特定运行时错误

时间:2014-01-24 22:28:22

标签: openmpi

感谢您阅读我的帖子。我刚开始使用openMPI。我在我的mac(OSX 10.5.8)和我的linux(薄荷14)上安装了openmpi 1.6.5。两台计算机都可以编译和运行非常简单的程序,例如Hello World,或者将整数从一个进程发送到另一个进程。但是,每当我尝试使用MPI_Bcast()或MPI_send()发送数组时,它都会引发分段错误。

#include <iostream>
#include <stdlib.h>
#include <mpi.h>
using namespace std;

int main(int argc,char** argv)
{
    int np,nid;
    float *a;

    MPI_Init(&argc,&argv);
    MPI_Comm_size(MPI_COMM_WORLD,&np);
    MPI_Comm_rank(MPI_COMM_WORLD,&nid); 

    if (nid == 0)
    {
        a = (float*) calloc(9,sizeof(float));
        for (int i = 0; i < 9; i++)
        {
            a[i] = i;
        }
    }

    MPI_Bcast(a,9,MPI_FLOAT,0,MPI_COMM_WORLD);  

    MPI_Finalize();
    return 0;
}    

以下是错误消息:

[rsove-M11BB:02854] *** Process received signal ***
[rsove-M11BB:02854] Signal: Segmentation fault (11)
[rsove-M11BB:02854] Signal code: Address not mapped (1)
[rsove-M11BB:02854] Failing at address: (nil)
[rsove-M11BB:02855] *** Process received signal ***
[rsove-M11BB:02855] Signal: Segmentation fault (11)
[rsove-M11BB:02855] Signal code: Address not mapped (1)
[rsove-M11BB:02855] Failing at address: (nil)
[rsove-M11BB:02854] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fddf08f64a0]
[rsove-M11BB:02854] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fddf0a02953]
[rsove-M11BB:02854] [ 2] /usr/local/openmpi    /lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fddf12a0b35]
[rsove-M11BB:02854] [ 3] /usr/local/openmpi/lib/openmpi    /mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fddece38ee5]
[rsove-M11BB:02854] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fddec61477d]
[rsove-M11BB:02854] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a)     [0x7fddf12ac2ea]
[rsove-M11BB:02854] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fddf11fce2d]
[rsove-M11BB:02854] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x4d6) [0x7fddeb73e346]
[rsove-M11BB:02854] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fddeb73e85b]
[rsove-M11BB:02854] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fddeb735b5c]
[rsove-M11BB:02854] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fddeb951799]
[rsove-M11BB:02854] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fddf12094d8]
[rsove-M11BB:02854] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02854] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fddf08e176d]
[rsove-M11BB:02854] [14] Test() [0x408df9]
[rsove-M11BB:02854] *** End of error message ***
[rsove-M11BB:02855] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa4c67be4a0]
[rsove-M11BB:02855] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fa4c68ca953]
[rsove-M11BB:02855] [ 2] /usr/local/openmpi/lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fa4c7168b35]
[rsove-M11BB:02855] [ 3] /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fa4c2d00ee5]
[rsove-M11BB:02855] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fa4c24dc77d]
[rsove-M11BB:02855] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a) [0x7fa4c71742ea]
[rsove-M11BB:02855] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fa4c70c4e2d]
[rsove-M11BB:02855] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x59c) [0x7fa4c160640c]
[rsove-M11BB:02855] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fa4c160685b]
[rsove-M11BB:02855] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fa4c15fdb5c]
[rsove-M11BB:02855] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fa4c1819799]
[rsove-M11BB:02855] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fa4c70d14d8]
[rsove-M11BB:02855] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02855] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4c67a976d]
[rsove-M11BB:02855] [14] Test() [0x408df9]
[rsove-M11BB:02855] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2854 on node rsove-M11BB exited on signal     11 (Segmentation fault).
--------------------------------------------------------------------------

奇怪的是,当我在朋友计算机上运行相同的代码时,它编译并运行没有问题。

提前感谢您的帮助。

1 个答案:

答案 0 :(得分:2)

你犯了一个非常典型的错误。 MPI_Bcast操作要求已经分配的数组作为其根和所有其他等级的第一个参数传递。因此,必须修改代码,例如像这样:

// Allocate the array everywhere
a = (float*) calloc(9,sizeof(float));
// Initialise the array at rank 0 only
if (nid == 0)
{
    for (int i = 0; i < 9; i++)
    {
        a[i] = i;
    }
}