术语" job"," task"," step"在SLURM文档中使用的文档彼此相关吗?
AFAICT,一项工作可能包含多个任务,并且它由多个步骤组成,但是,假设这是真的,我仍然不清楚任务和步骤是如何相关的。
查看显示作业/任务/步骤完全复杂性的示例会很有帮助。
答案 0 :(得分:11)
作业包含一个或多个步骤,每个步骤包含一个或多个任务,每个任务使用一个或多个 CPU < / em>的
通常使用sbatch
命令创建作业,使用srun
命令创建步骤,使用--ntasks
请求任务(在作业级别或步骤级别),并且CPU为使用--cpus-per-task
请求每项任务。请注意,使用sbatch
提交的作业有一个隐含的步骤; Bash脚本本身。
假设假设的工作:
#SBATCH --nodes 8
#SBATCH --tasks-per-node 8
# The job requests 64 CPUs, on 8 nodes.
# First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir.
# No need for more than one task per node, but it has to run on every node
srun --nodes 8 --tasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID
# Second step with the full allocation (64 tasks) to run an MPI
# program on some data to produce some output.
srun process.mpi <input.dat >output.txt
# Third step with a sub allocation of 48 tasks (because for instance
# that program does not scale as well) to post-process the output and
# extract meaningful information
srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt &
# Four step with a sub-allocation on a single node (because maybe
# it is a multithreaded program that cannot use CPUs on distinct nodes)
# to compress the raw output. This step runs at the same time as
# the previous one thanks to the ampersand `&`
OMP_NUM_THREAD=12 srun --ntasks 12 --nodes 1 --exclusive compress output.txt &
wait
创建了四个步骤,因此该作业的会计信息将有5行;每步一个加一个Bash脚本本身。