Question

我正在使用sbatch来提交我的工作命令行mpirun --version给出：

适用于Linux *操作系统的英特尔（R）MPI库，版本5.0 Build 20140507
版权所有（C）2003-2014，Intel Corporation。版权所有。

所以我认为我正在使用英特尔mpi 在instructions: submitting an MPI job using Intel MPI之后，我写了这样的脚本：

#!/bin/bash
#SBATCH --ntask=4
#SBATCH -t 00:10:00

. ~/.bash_profile

module load intel
mpirun mycc

mycc是我用mpicc编译源文件后得到的可执行文件然后我使用命令sbatch -p partitionname -J myjob script.sh，我的工作以exitcode 127：0失败。 slurm-jobid.out文件说（不要设置区域设置警告）：

/ usr / share / Modules / init / sh：第2行：/ usr / bin / modulecmd：没有这样的文件或目录 / tmp / slurmd / job252624 / slurm_scirpt：第10行：mpirun：找不到命令

但是我检查过并且/ usr / bin / modulecmd文件确实存在任何建议都得到了认可。

修改
我还问过问题here。

我删除了源语句和模块加载一个在提交作业之前，我尝试在登录节点上加载模块。但是有些不对劲。它说：

moduleCmd_Lad.c（204）：错误：105：无法找到“intel”的模块文件

我使用module avail命令查看可用的模块：

---------的/ usr /共享/模块/ modulefiles -------------------

dot module-info mpich2-x86_64 use.won

module-cvs modules null

---------的/ etc / modulefiles --------------------------------- < / p>
compat-openmpi-psm-x86_64 compat-openmpi-x86_64

原谅我乱糟糟的格式。

解决了

问题终于解决了。我的最终script.sh是这样的：

#!/bin/bash
srun -p partitionname -n 4 -t 00:10:00 mycc

然后使用命令sbatch -p partitionname -J myjob script.sh提交作业。

Answer 1

显然，所有计算节点中都不存在/ usr / bin / modulecmd。确保它存在于所有计算节点中，然后重试。

如果/ home由所有节点共享，您也不需要获取bash_profile，因为Slurm默认将所有环境导出到作业。

/ usr / bin / modulecmd：没有这样的文件或目录

1 个答案: