通常,我会同时向qsub
提交200个左右的工作,并被'completed successfully'
消息淹没,我会错过少数失败的消息及其相关的'failed'
消息
我使用什么命令来检索已提交的所有失败作业的列表?
答案 0 :(得分:1)
类似的东西:
while read line; do
if [ -z "$line" ] ;then
next
elif [ -z "${line//*completed successfully*}" ] ;then
echo The jobs was completed
elif [ -z "${line//*failed*}" ] ;then
echo The jobs has failed
else
echo Doing something with input: "$line"
fi
done < <(qsub <query args line>)
使用此方法,您可以在脚本中创建可用的变量:
success=() # Using an array to store even more than one result
while read line; do
if [ -z "$line" ] ;then
next
elif [ -z "${line//*completed successfully*}" ] ;then
# Assiming result in the form: The job number: #.* completed successfully
# meaning job number is immediately before the word completed and line
# space separated:
jobnr=${line% completed successfully*}
jobnr=${jobnr##* }
success+=("$jobnr ok")
elif [ -z "${line//*failed*}" ] ;then
jobnr=${line% failed*}
jobnr=${jobnr##* }
success+=("$jobnr failed")
fi
done < <(qsub 20 -cmd -line -args)
printf ": %s\n" "${success[@]}"
qsub ()
{
for ((i=${1:-10}; i--; 1))
do
case $((RANDOM%10)) in
1)
echo The job $i completed successfully.
;;
2)
echo The job $i failed.
;;
*)
echo job $i done...
;;
esac;
done
}
答案 1 :(得分:0)
如果您的qsub作业与&
并行运行,那么等待作业并查看其中某些作业是否以失败告终的好方法:
nbf=0
jobs -p|while read; do
wait $REPLY || (( nbf++ ))
done
echo "$nbf jobs ended with failure" >&2
您可以根据需要调整此示例(例如,通过特定作业列表更改jobs -p
的输出,或者在失败或成功时打印PID,...)。
答案 2 :(得分:-1)
做出一些假设:
qsub ... 2>&1 | grep -vi "completed successfully"