我正在尝试在SGE上工作,但它一直被杀死。我不确定我的脚本中应该更改哪个参数。
我的submit.sh脚本:
===========
#$ -l mem_free=32G
#$ -l h_rt=48:00:00
## softx will require 8 processors
softx myprogram.sh
==========
我将其提交给SGE:
qsub -q long.q submit.sh
我应该改变什么?
已终止作业和队列默认值的详细信息在
之下qacct -j 740
=============================================== ===============
qname long.q
hostname node02.local
department defaultdepartment
jobname submit.sh
jobnumber 740
taskid undefined
account sge
priority 0
granted_pe NONE
slots 1
failed 37 : qmaster enforced h_rt, h_cpu, or h_vmem limit
exit_status 137 (Killed)
ru_wallclock 1588s
ru_utime 0.110s
ru_stime 0.190s
ru_maxrss 5.520KB
ru_ixrss 0.000B
ru_ismrss 0.000B
ru_idrss 0.000B
ru_isrss 0.000B
ru_minflt 25267
ru_majflt 0
ru_nswap 0
ru_inblock 0
ru_oublock 176
ru_msgsnd 0
ru_msgrcv 0
ru_nsignals 0
ru_nvcsw 351
ru_nivcsw 95
cpu 10096.930s
mem 429.730GBs
io 76.911GB
iow 0.000s
maxvmem 8.635GB
arid undefined
ar_sub_time undefined
ar_sub_time undefined
category -q long.q -l h_rt=172800,mem_free=32G
=====
qconf -sq long.q
qname long.q
s_rt 864000
h_rt 864000
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem 8g
答案 0 :(得分:1)
队列的h_vmem限制为8g,无论他们请求什么,都会对作业强制执行。由于工作在半小时后被杀死,因此不应该是h_rt限制。作业报告的max_vmem超出队列限制。您需要与集群管理员讨论如何提交此类作业或更改问题,以便使用更少的虚拟内存。