LSF工作状态为给定的工作

时间:2017-01-02 07:00:44

标签: lsf

假设我的工作已运行了一段时间,由于机器超载而暂停状态,并在一段时间后运行并完成。 现在这份工作获得的状态是RUNNING - >暂停 - > RUNNING

如何获得某项工作所获得的所有州?

1 个答案:

答案 0 :(得分:0)

bjob​​s -l如果尚未从系统中清除作业。

bhist -l否则。您可能需要-n,具体取决于作业的年龄。

以下是暂停作业并稍后因系统负载暂时超过配置的阈值而恢复的bhist -l输出示例。

$ bhist -l 1168

Job <1168>, User <mclosson>, Project <default>, Command <sleep 10000>
Fri Jan 20 15:08:40: Submitted from host <hostA>, to 
                 Queue <normal>, CWD <$HOME>, Specified Hosts <hostA>;
Fri Jan 20 15:08:41: Dispatched 1 Task(s) on Host(s) <hostA>, Allocated 1 Slot(
                 s) on Host(s) <hostA>, Effective RES_REQ <select[type == any] or
                 der[r15s:pg] >;
Fri Jan 20 15:08:41: Starting (Pid 30234);
Fri Jan 20 15:08:41: Running with execution home </home/mclosson>, Execution CW
                 D </home/mclosson>, Execution Pid <30234>;
Fri Jan 20 16:19:22: Suspended:  Host load exceeded threshold:  1-minute CPU ru
                 n queue length (r1m)
Fri Jan 20 16:21:43: Running;

Summary of time in seconds spent in various states by  Fri Jan 20 16:22:09
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  1        0        4267     0        141      0        4409        

在16:19:22,由于r1m超过了阈值,因此暂停了工作。后来在16:21:43,工作重新开始。