在使用PBS的群集上使用MPI将一些数据写入文件时,我遇到了大麻烦。以下是简单问题模拟程序的示例。
<body>
<select id="our-details">
<option value="Our Details" disabled="disabled" data-class="ui-icon-carat-2-n-s">Contact Details</option>
<option value="Contact Us" data-class="ui-icon-mail-closed">Contact Us</option>
<option id="abou_02" value="Office Location" data-class="ui-icon-home">Our Location</option>
</select>
</body>
我用openmpi_intel-1.4.2使用comand
编译它#include <mpi.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <cstdlib>
#include <unistd.h>
int main(int argc, char* argv[]){
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Define hostname
char hostname[128];
gethostname(hostname, 128);
// check and create dump directory
struct stat buf;
int rc;
char *dir="Res";
rc = stat( dir, &buf );
if( rc ) // no dir, create
{ if( rank == 0 )
{
rc = mkdir( dir, 0771);
if( rc )
{std::ostringstream oss;
oss << "Can't create dump directory \""
<< dir
<< "\"";
}
}
else {
sleep (2);
}
}
else if( !S_ISDIR( buf.st_mode ) )
{std::ostringstream oss;
oss << "Path \""
<< dir
<< "\" is not directory for dump";
}
MPI_Barrier(MPI_COMM_WORLD);
// Every process defines name of file for output (res_0, res_1, res_2.....)
std::ostringstream filename;
filename << dir << "/res_"<< rank;
// Open file
std::ofstream file(filename.str().c_str());
// Output to file . Output seems like "I am 0 from 24. hostname"
file << "I am " << rank << " from " << size << ". " << hostname << std::endl;
file.close();
MPI_Finalize();
return 0;
}
然后我用脚本排队这个程序:
mpicxx -Wall test.cc -o test
我期待这个结果:
#!/bin/bash
#PBS -N test
#PBS -l select=8:ncpus=6:mpiprocs=6
#PBS -l walltime=00:01:30
#PBS -m n
#PBS -e stderr.txt
#PBS -o stdout.txt
cd $PBS_O_WORKDIR
echo "I run on node: `uname -n`"
echo "My working directory is: $PBS_O_WORKDIR"
echo "Assigned to me nodes are:"
cat $PBS_NODEFILE
mpirun -hostfile $PBS_NODEFILE ./test
但是只写入第一个节点的res_ *文件(res_ {0..5})而其余的不是。
有什么问题?
谢谢!
答案 0 :(得分:1)
好的,我们假设你运行在所有计算节点上连贯安装的文件系统。情况就是这样吧? 那么我在你的代码片段中看到的主要问题是所有进程都会同时声明目录,然后尝试创建它(如果它不存在)。我不确定究竟发生了什么,但我确信这不是最聪明的想法。
因为从本质上说你想要的是对目录和/或它的创建的串行健全性检查,如果需要,为什么不让MPI进程排名为0呢?
那会给你这样的东西:
if ( rank == 0 ) { // Only master manages the directory creation
int rc = stat( dir, &buf );
... // sanity check goes here and directory creation as well
// calling MPI_Abort() in case of failure seems also a good idea
}
// all other processes wait here
MPI_Barrier( MPI_COMM_WORLD );
// now we know the directory exists and is accessible
// let's do our stuff
这对你有用吗?