获取失败的qsub作业列表

时间:2013-10-02 16:28:21

标签: bash ubuntu cluster-computing qsub

通常,我会同时向qsub提交200个左右的工作,并被'completed successfully'消息淹没,我会错过少数失败的消息及其相关的'failed'消息

我使用什么命令来检索已提交的所有失败作业的列表?

3 个答案:

答案 0 :(得分:1)

类似的东西:

while read line; do
    if [ -z "$line" ] ;then
        next
    elif [ -z "${line//*completed successfully*}" ] ;then
        echo The jobs was completed
    elif [ -z "${line//*failed*}" ] ;then
        echo The jobs has failed
    else
        echo Doing something with input: "$line"
    fi
done < <(qsub <query args line>)

使用此方法,您可以在脚本中创建可用的变量:

success=()  # Using an array to store even more than one result
while read line; do
    if [ -z "$line" ] ;then
        next
    elif [ -z "${line//*completed successfully*}" ] ;then
        # Assiming result in the form: The job number: #.* completed successfully
        # meaning job number is immediately before the word completed and line
        # space separated:
        jobnr=${line% completed successfully*}
        jobnr=${jobnr##* }
        success+=("$jobnr ok")
    elif [ -z "${line//*failed*}" ] ;then
        jobnr=${line% failed*}
        jobnr=${jobnr##* }
        success+=("$jobnr failed")
    fi
done < <(qsub 20 -cmd -line -args)
printf ": %s\n" "${success[@]}"

用以下方法测试:

qsub () 
{ 
  for ((i=${1:-10}; i--; 1))
  do
    case $((RANDOM%10)) in 
        1)
            echo The job $i completed successfully.
        ;;
        2)
            echo The job $i failed.
        ;;
        *)
            echo job $i done...
        ;;
    esac;
  done
}

答案 1 :(得分:0)

如果您的qsub作业与&并行运行,那么等待作业并查看其中某些作业是否以失败告终的好方法:

nbf=0
jobs -p|while read; do
    wait $REPLY || (( nbf++ ))
done
echo "$nbf jobs ended with failure" >&2

您可以根据需要调整此示例(例如,通过特定作业列表更改jobs -p的输出,或者在失败或成功时打印PID,...)。

答案 2 :(得分:-1)

做出一些假设:

qsub ... 2>&1 | grep -vi "completed successfully"