LSF中jStatus日志值的含义

时间:2015-01-12 09:51:37

标签: sas lsf

我目前正在尝试破译 lsb.events 日志文件的内容,该文件由Platform Computing“Platform Process Manager”(Flow Manager)8.1版创建。

从各种sources文档中,我看到了jStatus变量的以下描述:

  • 4 = RUN
  • 32 = JOB_STAT_EXIT
  • 64 = JOB_STAT_DONE

但是在JOB_STATUS条目中,还有jStatus值为2和192. 这些值代表什么?

标记SAS,因为此实现与它捆绑在一起。作为一个侧面点,我观察到在某些情况下,我们的lsb.events文件中的实际字段与根据上述文档应该出现的字段不对齐。

2 个答案:

答案 0 :(得分:2)

状态2表示处于PSUSP状态的作业,该作业可通过多种方式获得(例如,使用-H选项提交作业以使其保持调度)。

对于192,答案是作业状态是一个位域。在这种情况下,设置了2位:

  • 64 = JOB_STAT_DONE
  • 128 = JOB_STAT_PDONE

JOB_STAT_PDONE表示作业已定义执行后脚本并已成功完成。

作业状态位的有效值位于包含目录中LSF附带的lsf/lsbatch.h文件中:<LSF_INSTALL_DIR>/<LSF_VERSION>/include/lsf/lsbatch.h

答案 1 :(得分:0)

为了扩展,感谢@Squirrel,我们的C:\LSF_7.0\7.0\include\lsf\lsbatch.h文件的相关内容是:

/**  * \addtogroup job_states job_states  * define job states  */ /*@{*/
#define JOB_STAT_NULL         0x00       /**< State null*/
#define JOB_STAT_PEND         0x01       /**< The job is pending, i.e., it 
                                            * has not been dispatched yet.*/
#define JOB_STAT_PSUSP        0x02       /**< The pending job was suspended by its
                                            * owner or the LSF system administrator.*/
#define JOB_STAT_RUN          0x04       /**< The job is running.*/
#define JOB_STAT_SSUSP        0x08       /**< The running job was suspended 
                                           * by the system because an execution 
                                           * host was overloaded or the queue run 
                                           * window closed. (see \ref lsb_queueinfo, 
                                           * \ref lsb_hostinfo, and lsb.queues.)
                                           */
#define JOB_STAT_USUSP        0x10       /**< The running job was suspended by its 
                                           * owner or the LSF system administrator.*/
#define JOB_STAT_EXIT         0x20       /**< The job has terminated with a non-zero
                                           * status - it may have been aborted due 
                                           * to an error in its execution, or 
                                           * killed by its owner or by the 
                                           * LSF system administrator.*/
#define JOB_STAT_DONE         0x40       /**< The job has terminated with status 0.*/
#define JOB_STAT_PDONE        (0x80)     /**< Post job process done successfully */
#define JOB_STAT_PERR         (0x100)    /**< Post job process has error */
#define JOB_STAT_WAIT         (0x200)    /**< Chunk job waiting its turn to exec */
#define JOB_STAT_RUNKWN       0x8000     /* Flag : Job status is UNKWN caused by 
                                          * losting contact with remote cluster */
#define JOB_STAT_UNKWN        0x10000    /**< The slave batch daemon (sbatchd) on 
                                          * the host on which the job is processed 
                                          * has lost contact with the master batch 
                                          * daemon (mbatchd).*/

再次,十进制:

0       JOB_STAT_NULL
1       JOB_STAT_PEND
2       JOB_STAT_PSUSP
4       JOB_STAT_RUN
8       JOB_STAT_SSUSP 
16      JOB_STAT_USUSP 
32      JOB_STAT_EXIT 
64      JOB_STAT_DONE
128     JOB_STAT_PDONE 
256     JOB_STAT_PERR 
512     JOB_STAT_WAIT
32768   JOB_STAT_RUNKWN 
65536   JOB_STAT_UNKWN