术语"工作","任务"和"步骤"彼此相关?

时间:2017-09-30 20:30:11

标签: slurm

术语" job"," task"," step"在SLURM文档中使用的文档彼此相关吗?

AFAICT,一项工作可能包含多个任务,并且它由多个步骤组成,但是,假设这是真的,我仍然不清楚任务和步骤是如何相关的。

查看显示作业/任务/步骤完全复杂性的示例会很有帮助。

1 个答案:

答案 0 :(得分:11)

作业包含一个或多个步骤,每个步骤包含一个或多个任务,每个任务使用一个或多个 CPU < / em>的

通常使用sbatch命令创建作业,使用srun命令创建步骤,使用--ntasks请求任务(在作业级别或步骤级别),并且CPU为使用--cpus-per-task请求每项任务。请注意,使用sbatch提交的作业有一个隐含的步骤; Bash脚本本身。

假设假设的工作:

#SBATCH --nodes 8
#SBATCH --tasks-per-node 8
# The job requests 64 CPUs, on 8 nodes.    

# First step, with a sub-allocation of 8 tasks (one per node) to create a tmp dir. 
# No need for more than one task per node, but it has to run on every node
srun --nodes 8 --tasks 8 mkdir -p /tmp/$USER/$SLURM_JOBID

# Second step with the full allocation (64 tasks) to run an MPI 
# program on some data to produce some output.
srun process.mpi <input.dat >output.txt

# Third step with a sub allocation of 48 tasks (because for instance 
# that program does not scale as well) to post-process the output and 
# extract meaningful information
srun --ntasks 48 --nodes 6 --exclusive postprocess.mpi <output.txt >result.txt &

# Four step with a sub-allocation on a single node (because maybe 
# it is a multithreaded program that cannot use CPUs on distinct nodes)    
# to compress the raw output. This step runs at the same time as 
# the previous one thanks to the ampersand `&` 
OMP_NUM_THREAD=12 srun --ntasks 12 --nodes 1 --exclusive compress output.txt &

wait

创建了四个步骤,因此该作业的会计信息将有5行;每步一个加一个Bash脚本本身。