我正在使用SLURM作业管理器在运行Ubuntu Server 14.04.3的Linux集群中调度作业。我注意到sinfo以混合模式报告所有节点是否部分或完全分配;空闲节点被正确报告为空闲。以下是sinfo命令的输出:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
compute* up infinite 5 mix node[01-05]
compute* up infinite 1 idle node06
但是,node04已完全分配,因此其状态应由sinfo报告为alloc,而node03部分分配,如使用scontrol命令所示:
scontrol show node node04
CPUAlloc=6 CPUErr=0 CPUTot=6 CPULoad=6.01 Features=(null)
Gres=(null)
NodeAddr=node04 NodeHostName=node04
OS=Linux RealMemory=64333 AllocMem=0 Sockets=1 Boards=1
State=ALLOCATED ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2016-04-11T16:38:52 SlurmdStartTime=2016-04-11T16:39:59
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
scontrol show node node03
CPUAlloc=1 CPUErr=0 CPUTot=6 CPULoad=1.01 Features=(null)
Gres=(null)
NodeAddr=node03 NodeHostName=node03
OS=Linux RealMemory=64333 AllocMem=0 Sockets=1 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1
BootTime=2016-04-11T16:38:38 SlurmdStartTime=2016-04-11T16:39:08
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
sinfo有什么问题?
提前感谢任何建议!