如何知道PBS批处理作业何时完成

时间:2014-09-04 20:14:03

标签: bash pbs

我有一个BASH脚本,它将多个串行作业提交给PBS排队系统。提交作业后,脚本结束。然后作业在一个集群上运行,当它们全部完成后,我可以继续下一步。典型的工作流程可能涉及其中的几个步骤。

我的问题:

我的脚本是否有办法在完成提交后不退出,而是睡眠,直到该脚本提交的所有作业在群集上完成,然后才退出?

3 个答案:

答案 0 :(得分:1)

您正在尝试建立工作流程,对吗?做你试图完成的事情的最好方法是使用job dependencies。从本质上讲,您要做的是提交X个作业,然后提交更多依赖于第一组作业的作业,并且您可以使用作业依赖性来执行此操作。您可以通过不同的方式来查看上一个链接中可以阅读的依赖项,但这里提供了3个作业,然后再提交3个不会执行的示例,直到前3个作业退出后才会执行。< / p>

#first batch
jobid1=`qsub ...`
jobid2=`qsub ...`
jobid3=`qsub ...`

#next batch
depend_str="-W after:${jobid1} -W after:${jobid2} -W after:${jobid3}"
qsub ... $depend_str
qsub ... $depend_str
qsub ... $depend_str

答案 1 :(得分:0)

执行此操作的一种方法是使用GNU Parallel命令'sem'

我也了解了这个做队列的事情。它充当一个计时器,允许在退出等命令后执行命令。

编辑:我知道这里的例子非常基本但是有很多可以通过并行--sem甚至只是并行来运行任务来实现。看看本教程,我相信你将能够找到一个有用的相关示例。

There is a great tutorial here

教程中的一个例子:

  sem 'sleep 1; echo The first finished' &&
    echo The first is now running in the background &&
    sem 'sleep 1; echo The second finished' &&
    echo The second is now running in the background
  sem --wait

输出:

第一个现在正在后台运行

第一次完成

第二个现在正在后台运行

第二次完成

See Man Page

答案 2 :(得分:0)

要实际检查作业是否完成,我们需要使用grep和作业ID来获取作业状态,然后使用#!/bin/bash # SECTION 1: Launch all jobs and store their job IDs in a variable myJobs="job1.qsub job2.qsub job3.qsub" # Your job names here numJobs=$(echo "$myJobs" | wc -w) # Count the jobs myJobIDs="" # Initialize an empty list of job IDs for job in $myJobs; do jobID_full=$(qsub $job) # jobID_full will look like "12345.machinename", so use sed # to get just the numbers jobID=$(echo "$jobID_full" | sed -e 's|\([0-9]*\).*|\1|') myJobIDs="$myJobIDs $jobID" # Add this job ID to our list done # SECTION 2: Check the status of each job, and exit while loop only # if they are all complete numDone=0 # Initialize so that loop starts while [ $numDone -lt $numJobs ]; do # Less-than operator numDone=0 # Zero since we will re-count each time for jobID in $myJobIDs; do # Loop through each job ID # The following if-statement ONLY works if qstat won't return # the string ' C ' (a C surrounded by two spaces) in any # situation besides a completed job. I.e. if your username # or jobname is 'C' then this won't work! # Could add a check for error (grep -q ' E ') too if desired if qstat $jobID | grep -q ' C ' then (( numDone++ )) else echo $numDone jobs completed out of $numJobs sleep 1 fi done done echo all jobs complete 状态代码的状态。只要您的用户名或工作名称不是&#34; C&#34;,以下内容应该有效:

1,15,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0

1,31,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0

1,46,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0

1,62,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0