我在C中使用MPI(openMPI 3.0.1)来完成我的工作,我想知道如何设置mpi_comm_spawn,就好像我用mpirun运行我的可执行文件一样?
我使用这个主机文件:
# This is a single processor node:
# This is a height-processor node. Oversubscribing
# to it is prevented by setting max-slots=4:
# host | rank | nb procs | max nb procs
node7 0,2 slots=2 max-slots=4
# This is a height-processor node. Oversubscribing
# to it is prevented by setting max-slots=8:
# host | rank | nb procs | max nb procs
node8 1,3,4,5,6,7 slots=6 max-slots=8
所以当我用
启动hostname
时
mpirun -hostfile /path/to/mpi_hostfile hostname
我给了
node7
node7
node8
node8
node8
node8
node8
node8
但当我生成新进程时(警告这是伪代码)
maxprocs = to_unsigned(arg)
info_key = "add-hostfile"
info_value = "/path/to/mpi_hostfile"
mpi_comm_spawn(....)
这是我使用的代码
mpi->comm_spawn->set_attr_command(argv[0]);
mpi->comm_spawn->set_attr_argv(argv + 1);
mpi->comm_spawn->set_attr_maxprocs(universe_size - 1);
if (mc_options.hostfile == nullptr)
mpi->comm_spawn->set_attr_info(spot::mpi::info_null);
else
{
// OpenMPI specific
const char* hostfile = "add-hostfile";
mpi->info_create->set_attr_info(&info);
mpi->info_create->do_info_create();
mpi->info_set->set_attr_info(info);
mpi->info_set->set_attr_key(hostfile);
mpi->info_set->set_attr_value(mc_options.hostfile);
mpi->info_set->do_info_set();
mpi->comm_spawn->set_attr_info(info);
}
mpi->comm_spawn->set_attr_root(master);
mpi->comm_spawn->set_attr_comm(spot::mpi::comm_self);
mpi->comm_spawn->set_attr_intercomm(&children);
mpi->comm_spawn->
set_attr_array_of_errcodes(spot::mpi::errcodes_ignore);
mpi->comm_spawn->do_comm_spawn();
我在node7上拥有所有进程,并且我无法启动2个以上的进程(比如我在node7的mpi_hostfile中指定)
Process #0: on HOST node7
Process #1: on HOST node7
如果我尝试推出更多
--------------------------------------------------------------------------
There are not enough slots available in the system to satisfy the 3 slots
that were requested by the application:
/* my executable*/
Either request fewer slots for your application, or make more slots available
for use.
--------------------------------------------------------------------------
[node7:2306] *** An error occurred in MPI_Comm_spawn
[node7:2306] *** reported by process [4042260481,0]
[node7:2306] *** on communicator MPI_COMM_SELF
[node7:2306] *** MPI_ERR_SPAWN: could not spawn processes
[node7:2306] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[node7:2306] *** and potentially your MPI job)
我希望
node7
node7
node8
node8
node8
node8
node8
node8
和mpirun一样
我不明白为什么会有所不同?
感谢。