OpenMPI - 不同过程中的相同排名

时间:2018-03-13 16:21:31

标签: c operating-system openmpi

我已经在OpenMPI上工作了一段时间,并且在我的procs中要求排名时,我没有得到预期的行为。

我有一个简单的C程序,应该打印每个proc的等级:

minimal.c:

#include <stdio.h>
#include "mpi.h"

int
main (int argc, char *argv[])
{
    unsigned int procs;
    unsigned int self;
    MPI_Comm com;

    /* MPI ini */
    MPI_Init (&argc, &argv);
    com = MPI_COMM_WORLD;
    MPI_Comm_size (com, &procs);
    MPI_Comm_rank (com, &self);

    printf("My rank is %d\n", self);

    /* MPI Finalize */
    MPI_Finalize();
    return 0;
}
我编译的

mpicc minimal.c -o minimal

现在,如果我在自己的计算机上运行以下命令:

mpirun -np 2 minimal

我得到以下追踪:

$ mpirun -np 2 minimal
My rank is 0
My rank is 0

我觉得很令人不安。

所以,我继续挖掘mpirun手册,最后用 -display-devel-map -report-bindings 打印其他信息,这就是跟踪我得到了:

$ mpirun -np 2 -display-devel-map -report-bindings minimal
 Data for JOB [53858,1] offset 0

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYCORE  Ranking policy: SLOT
 Binding policy: CORE:IF-SUPPORTED  Cpu set: NULL  PPR: NULL  Cpus-per-rank: 1
  Num new daemons: 0  New daemon starting vpid INVALID
  Num nodes: 1

 Data for node: UX31A     Launch id: -1   State: 2
  Daemon: [[53858,0],0]   Daemon launched: True
  Num slots: 2    Slots in use: 2 Oversubscribed: FALSE
  Num slots allocated: 2  Max slots: 0
  Username on node: NULL
  Num procs: 2    Next node_rank: 2
  Data for proc: [[53858,1],0]
      Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
      State: INITIALIZED  App_context: 0
      Locale: [BB/..]
      Binding: [BB/..]
  Data for proc: [[53858,1],1]
      Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
      State: INITIALIZED  App_context: 0
      Locale: [../BB]
      Binding: [../BB]
[UX31A:04861] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB]
[UX31A:04861] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/..]
My rank is 0
My rank is 0

让我感到困惑。

我正在使用Ubuntu 16.04和apt repos中的OpenMPI包。我的电脑是华硕UX31a。

如果有人能给我一些关于这里发生的事情的信息,我将非常感激。

谢谢!

1 个答案:

答案 0 :(得分:0)

感谢Gilles Gouaillardet,我终于找到了正在发生的事情!

原来我安装了mpich个库和openmpi个垃圾箱!

这就是我的所作所为:

  1. 检查我的二进制文件中使用了哪个库:

    $ ldd minimal ... libmpich.so.12 => /usr/lib/x86_64-linux-gnu/libmpich.so.12 ...

    $ dpkg -S /usr/lib/x86_64-linux-gnu/libmpich.so.12 libmpich12:amd64: /usr/lib/x86_64-linux-gnu/libmpich.so.12.1.0

  2. 检查我的mpiccmpirun二进制文件提供了哪个包:

    $ which mpirun /usr/bin/mpirun

    $ dpkg -S mpirun openmpi-bin: /usr/bin/mpirun.openmpi ...

  3. 我删除了已安装的mpich个包

    sudo apt-get remove libmpich12 libmpich-dev

  4. 我安装了我需要的openmpi个库

    sudo apt-get install libopenmpi-dev

  5. 一旦完成,我再次编译:

    $ mpicc minimal.c -o minimal
    $ mpirun -np 2 minimal
    My rank is 0
    My rank is 1
    

    Hurray!