我想知道为什么 sinfo 命令显示的信息与 squeue 返回的信息不同。 我曾多次遇到过 sinfo 返回的已分配节点数与 squeue 中显示的已分配节点数不匹配。
有人注意到这个问题吗?
我使用的是17.02版本,但在以前的版本中,我也遇到了这种情况。
例如,我在同一时刻执行了squeue和sinfo,这就是我得到的:
Thu Aug 24 10:12:50 2017
JOBID PARTITION NAME ST TIME NODES NODELIST(REASON)
57011 dmrTest jacobi-98 R 3:02 4 s07r2b[15-18]
57012 dmrTest cg-99 R 3:02 4 s07r2b[11-14]
57010 dmrTest jacobi-98 R 3:24 32 s01r1b[39,42,45-48],s07r2b[09-10],s09r2b[54-58,63-64,67],s14r1b[61-62,65-67],s14r2b[49,54-
57,61-63,65],s24r2b[57,59]
57008 dmrTest nbody-96 R 3:52 8 s09r1b[53-55,57-59,61-62]
57009 dmrTest nbody-97 R 3:52 4 s10r2b[49-50],s14r2b[69,72]
57007 dmrTest nbody-95 R 4:21 4 s24r2b[49-52]
57006 dmrTest cg-94 R 4:26 4 s14r1b[49-52]
57004 dmrTest cg-92 R 4:42 4 s01r1b[33-35,37]
57003 dmrTest nbody-91 R 4:45 4 s10r2b[61,63-65]
57001 dmrTest nbody-89 R 4:46 8 s10r2b[51-52],s14r2b[50-53],s24r2b[53-54]
57002 dmrTest cg-90 R 4:46 2 s24r2b[62-63]
57000 dmrTest nbody-88 R 5:00 8 s09r2b[49,51-53],s14r1b[57-60]
56999 dmrTest nbody-87 R 5:02 4 s07r2b[19-20,22-23]
56997 dmrTest cg-85 R 5:36 8 s01r1b[29-32],s10r2b[53-56]
56995 dmrTest cg-83 R 7:08 2 s10r2b[66-67]
56988 dmrTest jacobi-76 R 8:28 2 s14r1b[53-54]
56985 dmrTest cg-74 R 8:36 4 s09r2b[68-70,72]
56978 dmrTest cg-69 R 9:37 2 s14r1b[55-56]
56976 dmrTest cg-67 R 9:40 2 s14r1b[63-64]
56974 dmrTest cg-65 R 10:49 4 s10r2b[69-70],s24r2b[64-65]
56973 dmrTest cg-64 R 10:58 4 s09r1b[49-52]
56971 dmrTest cg-62 R 11:22 2 s14r2b[67-68]
56969 dmrTest cg-60 R 11:35 2 s24r2b[55-56]
如果计算 squeue 命令的节点,您将看到114个已分配的节点,但是如果您检查 sinfo ,那么您拥有的是:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
dmrTest* up infinite 94 alloc s01r1b[29-35,37,47-48],s07r2b[09-20,22-23],s09r1b[49-55,57-59,61-62],s09r2b[49,51-53,68-70,72],s10r2b[49-56,61,63-67,69-70],s14r1b[49-60,63-64],s14r2b[50-53,67-69,72],s24r2b[49-56,62-65]
dmrTest* up infinite 34 idle s01r1b[39,42,45-46],s09r1b[63-66],s09r2b[54-58,63-64,67],s14r1b[61-62,65-67],s14r2b[49,54-57,61-63,65],s24r2b[57,59-61]
只有94个已分配的节点。