SGE faild提交作业,属性不是内存值

时间:2016-08-05 08:39:18

标签: qsub sungridengine

我无法使用mem属性提交作业。由于我是新手,谷歌两天后,我在这里寻求帮助。任何建议都会感激不尽!

以下是我的所作所为:

\ 1。提交我的剧本:

qsub -S /bin/bash -A assembly -pe threads 16 -l mem=2GB -cwd -N "pBcR_correct_asm" -j y -o /dev/null runCorrection.sh

Unable to run job: unknown resource "mem".
Exiting.

\ 2。考虑到我已将“h”替换为“host”,根据SGE unknown resource "nodes"解决了我的问题,我将“m”替换为“mem”,但它不起作用。

\ 3。谷歌之后,我知道“h”是在“/ opt / gridengine / util / resources / centry /”中定义的快捷方式 主机名“,可以通过”qconf -sc“确认:

qconf -sc

#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
arch                a          RESTRING    ==    YES         NO         NONE     0
calendar            c          RESTRING    ==    YES         NO         NONE     0
cpu                 cpu        DOUBLE      >=    YES         NO         0        0
display_win_gui     dwg        BOOL        ==    YES         NO         0        0
h_core              h_core     MEMORY      <=    YES         NO         0        0
h_cpu               h_cpu      TIME        <=    YES         NO         0:0:0    0
h_data              h_data     MEMORY      <=    YES         NO         0        0
h_fsize             h_fsize    MEMORY      <=    YES         NO         0        0
h_rss               h_rss      MEMORY      <=    YES         NO         0        0
h_rt                h_rt       TIME        <=    YES         NO         0:0:0    0
h_stack             h_stack    MEMORY      <=    YES         NO         0        0
h_vmem              h_vmem     MEMORY      <=    YES         NO         0        0
hostname            h          HOST        ==    YES         NO         NONE     0
load_avg            la         DOUBLE      >=    NO          NO         0        0
load_long           ll         DOUBLE      >=    NO          NO         0        0
load_medium         lm         DOUBLE      >=    NO          NO         0        0
load_short          ls         DOUBLE      >=    NO          NO         0        0
m_core              core       INT         <=    YES         NO         0        0
m_socket            socket     INT         <=    YES         NO         0        0
m_topology          topo       RESTRING    ==    YES         NO         NONE     0
m_topology_inuse    utopo      RESTRING    ==    YES         NO         NONE     0
mem_free            mf         MEMORY      <=    YES         NO         0        0
mem_total           mt         MEMORY      <=    YES         NO         0        0
mem_used            mu         MEMORY      >=    YES         NO         0        0

\ 4。因此我将“mt”替换为“mem”,但它抱怨了属性问题。根据上面的输出,似乎mem_total几乎与之前工作的“hostname”相同。然后,我认为在通过SGE指南后jsv可能是一个问题,但是我找不到任何包含“无法运行作业:属性......”的脚本,这些脚本位于“/ opt / gridengine”的导演下/ UTIL /资源/ JSV”。我想我必须配置一些文件,但这些文件是什么,我应该怎么做?

qsub -S /bin/bash -A assembly -pe threads 16 -l mt=2GB -cwd -N "pBcR_correct_asm" -j y -o test.out  runCorrection.sh

Unable to run job: attribute "mem_total" is not a memory value.
Exiting.

2 个答案:

答案 0 :(得分:1)

@Vince!

非常感谢您的回复。

最后我解决了我的问题,使用“h_vmem = 2g”(“2GB”会给出错误),但我不知道在哪里可以找到如何设计复合体的值(MEMORY)。

现在没有必要提供以下信息。

我已经阅读了您提供的网站,并将复杂的h_vmem和s_vmeme的属性配置为“耗材”,但它不起作用。我想我必须配置队列的“complex_value”,目前是“NONE”。但是,我无法打开可能告诉我如何配置的网络http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_types.html?pathrev=V62u5_TAG。我是否正确配置配置队列?我是否也必须配置主机?

任何建议都会感激不尽!

以下是我的所作所为:

\ 1。对于h_vmem和s_vmem,将耗材的属性更改为“YES”:

qconf -sc

#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
arch                a          RESTRING    ==    YES         NO         NONE     0
calendar            c          RESTRING    ==    YES         NO         NONE     0
cpu                 cpu        DOUBLE      >=    YES         NO         0        0
display_win_gui     dwg        BOOL        ==    YES         NO         0        0
h_core              h_core     MEMORY      <=    YES         NO         0        0
h_cpu               h_cpu      TIME        <=    YES         NO         0:0:0    0
h_data              h_data     MEMORY      <=    YES         NO         0        0
h_fsize             h_fsize    MEMORY      <=    YES         NO         0        0
h_rss               h_rss      MEMORY      <=    YES         NO         0        0
h_rt                h_rt       TIME        <=    YES         NO         0:0:0    0
h_stack             h_stack    MEMORY      <=    YES         NO         0        0
h_vmem              h_vmem     MEMORY      <=    YES         YES        0        0
hostname            h          HOST        ==    YES         NO         NONE     0
load_avg            la         DOUBLE      >=    NO          NO         0        0
load_long           ll         DOUBLE      >=    NO          NO         0        0
load_medium         lm         DOUBLE      >=    NO          NO         0        0
load_short          ls         DOUBLE      >=    NO          NO         0        0
m_core              core       INT         <=    YES         NO         0        0
m_socket            socket     INT         <=    YES         NO         0        0
m_topology          topo       RESTRING    ==    YES         NO         NONE     0
m_topology_inuse    utopo      RESTRING    ==    YES         NO         NONE     0
mem_free            mf         MEMORY      <=    YES         NO         0        0
mem_total           mt         MEMORY      <=    YES         NO         0        0
mem_used            mu         MEMORY      >=    YES         NO         0        0
min_cpu_interval    mci        TIME        <=    NO          NO         0:0:0    0
np_load_avg         nla        DOUBLE      >=    NO          NO         0        0
np_load_long        nll        DOUBLE      >=    NO          NO         0        0
np_load_medium      nlm        DOUBLE      >=    NO          NO         0        0
np_load_short       nls        DOUBLE      >=    NO          NO         0        0
num_proc            p          INT         ==    YES         NO         0        0
qname               q          RESTRING    ==    YES         NO         NONE     0
rerun               re         BOOL        ==    NO          NO         0        0
s_core              s_core     MEMORY      <=    YES         NO         0        0
s_cpu               s_cpu      TIME        <=    YES         NO         0:0:0    0
s_data              s_data     MEMORY      <=    YES         NO         0        0
s_fsize             s_fsize    MEMORY      <=    YES         NO         0        0
s_rss               s_rss      MEMORY      <=    YES         NO         0        0
s_rt                s_rt       TIME        <=    YES         NO         0:0:0    0
s_stack             s_stack    MEMORY      <=    YES         NO         0        0
s_vmem              s_vmem     MEMORY      <=    YES         YES        0        0
seq_no              seq        INT         ==    NO          NO         0        0
slots               s          INT         <=    YES         YES        1        1000
swap_free           sf         MEMORY      <=    YES         NO         0        0
swap_rate           sr         MEMORY      >=    YES         NO         0        0
swap_rsvd           srsv       MEMORY      >=    YES         NO         0        0
swap_total          st         MEMORY      <=    YES         NO         0        0
swap_used           su         MEMORY      >=    YES         NO         0        0
tmpdir              tmp        RESTRING    ==    NO          NO         NONE     0
virtual_free        vf         MEMORY      <=    YES         NO         0        0
virtual_total       vt         MEMORY      <=    YES         NO         0        0
virtual_used        vu         MEMORY      >=    YES         NO         0        0
# >#< starts a comment but comments are not saved across edits --------

\ 2。将我的工作提交到smp.q队列,它抱怨同样的问题:

qsub -S /bin/bash -A assembly -q smp.q -pe newPe 16 -l h_vmem=2GB -cwd -N "pBcR_correct_asm" -j y -o runCorrection.sh

Unable to run job: attribute "h_vmem" is not a memory value.
Exiting.

\ 3。 smp.q.的信息我认为应该改变“complex_values”并且“h_vmem”可以保持不变:

qconf -sq smp.q

qname                 smp.q
hostlist              @smp.q
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make newPe
rerun                 FALSE
slots                 160
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

\ 4。 @ smp.q中主机的信息:

qconf -sconf smp03.local

#smp03.local:
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
execd_spool_dir              /opt/gridengine/default/spool

\ 5。全球信息。我在这里添加了h_vmem和s_vmem吗?

qconf -sconf

#global:
execd_spool_dir              /opt/gridengine/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           none
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 ENABLE_ADDGRP_KILL=TRUE H_MEMORYLOCKED=infinity
reporting_params             accounting=true reporting=true \
                             flush_time=00:00:15 joblog=true sharelog=00:00:00
finished_jobs                100
gid_range                    20000-20100
qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
max_advance_reservations     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0
jsv_url                      none
jsv_allowed_mod              ac,h,i,e,o,j,M,N,p,w

答案 1 :(得分:0)

您可能想要的是h_vmem。至少这是我总是使用的属性来指定我想要的作业请求的内存。

请参阅:

http://gridscheduler.sourceforge.net/htmlman/htmlman5/queue_conf.html?pathrev=V62u5_TAG

具体地,

     The resource limit parameters s_vmem and h_vmem  are  imple-
     mented  by  Sun  Grid  Engine  as a job limit. They impose a
     limit on the amount of combined virtual memory  consumed  by
     all the processes in the job. If h_vmem is exceeded by a job
     running in the queue, it is aborted  via  a  SIGKILL  signal
     (see  kill(1)).   If  s_vmem  is exceeded, the job is sent a
     SIGXCPU signal which can be caught by the job.  If you  wish
     to  allow  a  job  to  be "warned" so it can exit gracefully
     before it is killed then you should set the s_vmem limit  to
     a  lower  value  than  h_vmem.   For parallel processes, the
     limit is applied per slot which means that the limit is mul-
     tiplied  by the number of slots being used by the job before
     being applied.

此外,您可能需要使用qconf将其设置为耗材。