我有一个BASH脚本,它将多个串行作业提交给PBS排队系统。提交作业后,脚本结束。然后作业在一个集群上运行,当它们全部完成后,我可以继续下一步。典型的工作流程可能涉及其中的几个步骤。
我的问题:
我的脚本是否有办法在完成提交后不退出,而是睡眠,直到该脚本提交的所有作业在群集上完成,然后才退出?
答案 0 :(得分:1)
您正在尝试建立工作流程,对吗?做你试图完成的事情的最好方法是使用job dependencies。从本质上讲,您要做的是提交X个作业,然后提交更多依赖于第一组作业的作业,并且您可以使用作业依赖性来执行此操作。您可以通过不同的方式来查看上一个链接中可以阅读的依赖项,但这里提供了3个作业,然后再提交3个不会执行的示例,直到前3个作业退出后才会执行。< / p>
#first batch
jobid1=`qsub ...`
jobid2=`qsub ...`
jobid3=`qsub ...`
#next batch
depend_str="-W after:${jobid1} -W after:${jobid2} -W after:${jobid3}"
qsub ... $depend_str
qsub ... $depend_str
qsub ... $depend_str
答案 1 :(得分:0)
执行此操作的一种方法是使用GNU Parallel命令'sem'
我也了解了这个做队列的事情。它充当一个计时器,允许在退出等命令后执行命令。
编辑:我知道这里的例子非常基本但是有很多可以通过并行--sem甚至只是并行来运行任务来实现。看看本教程,我相信你将能够找到一个有用的相关示例。
There is a great tutorial here
教程中的一个例子:
sem 'sleep 1; echo The first finished' &&
echo The first is now running in the background &&
sem 'sleep 1; echo The second finished' &&
echo The second is now running in the background
sem --wait
输出:
第一个现在正在后台运行
第一次完成
第二个现在正在后台运行
第二次完成
答案 2 :(得分:0)
要实际检查作业是否完成,我们需要使用grep
和作业ID来获取作业状态,然后使用#!/bin/bash
# SECTION 1: Launch all jobs and store their job IDs in a variable
myJobs="job1.qsub job2.qsub job3.qsub" # Your job names here
numJobs=$(echo "$myJobs" | wc -w) # Count the jobs
myJobIDs="" # Initialize an empty list of job IDs
for job in $myJobs; do
jobID_full=$(qsub $job)
# jobID_full will look like "12345.machinename", so use sed
# to get just the numbers
jobID=$(echo "$jobID_full" | sed -e 's|\([0-9]*\).*|\1|')
myJobIDs="$myJobIDs $jobID" # Add this job ID to our list
done
# SECTION 2: Check the status of each job, and exit while loop only
# if they are all complete
numDone=0 # Initialize so that loop starts
while [ $numDone -lt $numJobs ]; do # Less-than operator
numDone=0 # Zero since we will re-count each time
for jobID in $myJobIDs; do # Loop through each job ID
# The following if-statement ONLY works if qstat won't return
# the string ' C ' (a C surrounded by two spaces) in any
# situation besides a completed job. I.e. if your username
# or jobname is 'C' then this won't work!
# Could add a check for error (grep -q ' E ') too if desired
if qstat $jobID | grep -q ' C '
then
(( numDone++ ))
else
echo $numDone jobs completed out of $numJobs
sleep 1
fi
done
done
echo all jobs complete
状态代码的状态。只要您的用户名或工作名称不是&#34; C&#34;,以下内容应该有效:
1,15,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,31,-0.248010047716,0.00378335508419,-0.0152548459993,-86.3738760481,0.872322164158,-3.51314800063,0
1,46,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0
1,62,-0.267422664673,0.0051143782875,-0.0191247001961,-85.7662354031,1.0928406847,-4.08015176908,0