SGE工作被杀

时间:2017-03-31 15:14:50

标签: parallel-processing cluster-computing

我正在尝试在SGE上工作,但它一直被杀死。我不确定我的脚本中应该更改哪个参数。

我的submit.sh脚本:

===========

#$ -l mem_free=32G
#$ -l h_rt=48:00:00

## softx will require 8 processors
softx myprogram.sh

==========

我将其提交给SGE:

qsub -q long.q submit.sh

我应该改变什么?

已终止作业和队列默认值的详细信息在

之下
qacct -j 740

=============================================== ===============

qname        long.q
hostname     node02.local
department   defaultdepartment
jobname      submit.sh
jobnumber    740
taskid       undefined
account      sge
priority     0

granted_pe   NONE
slots        1
failed       37  : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status  137                  (Killed)
ru_wallclock 1588s
ru_utime     0.110s
ru_stime     0.190s
ru_maxrss    5.520KB
ru_ixrss     0.000B
ru_ismrss    0.000B
ru_idrss     0.000B
ru_isrss     0.000B
ru_minflt    25267
ru_majflt    0
ru_nswap     0
ru_inblock   0
ru_oublock   176
ru_msgsnd    0
ru_msgrcv    0
ru_nsignals  0
ru_nvcsw     351
ru_nivcsw    95
cpu          10096.930s
mem          429.730GBs
io           76.911GB
iow          0.000s
maxvmem      8.635GB
arid         undefined
ar_sub_time  undefined
ar_sub_time  undefined

category     -q long.q -l h_rt=172800,mem_free=32G

=====

qconf -sq long.q

qname                 long.q
s_rt                  864000
h_rt                  864000
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                8g

1 个答案:

答案 0 :(得分:1)

队列的h_vmem限制为8g,无论他们请求什么,都会对作业强制执行。由于工作在半小时后被杀死,因此不应该是h_rt限制。作业报告的max_vmem超出队列限制。您需要与集群管理员讨论如何提交此类作业或更改问题,以便使用更少的虚拟内存。