使用python子进程

时间:2016-03-15 08:59:52

标签: python-2.7 subprocess qsub

我想通过调度程序SGE使用管道将作业提交到计算机集群:

$ echo -e 'date; sleep 2; date' | qsub -cwd -j y -V -q all.q -N test 

(队列可能因特定群集而异。)

在bash终端中运行此命令行可以在我可以访问的集群上运行,使用GNU bash版本3.2.25,GE版本6.2u5和Linux 2.6 x86_64。

在Python 2.7.2中,这是我的命令(整个脚本以gist形式提供):

import subprocess
queue = "all.q"
jobName = "test"
cmd = "date; sleep 2; date"
echoArgs = ["echo", "-e", "'%s'" % cmd]
qsubArgs = ["qsub", "-cwd", "-j", "y", "-V", "-q", queue, "-N", jobName]

案例1:使用shell=True使其有效:

wholeCmd = " ".join(echoArgs) + " | " + " ".join(qsubArgs)
out = subprocess.Popen(wholeCmd, shell=True, stdout=subprocess.PIPE)
out = out.communicate()[0]
jobId = out.split()[2]

但我想在official documentation中解释安全原因,以避免这种情况。

案例2 使用与上述相同的代码但shell=False导致以下错误消息,以便甚至不提交作业:

Traceback (most recent call last):
  File "./test.py", line 22, in <module>
    out = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE)
  File "/share/apps/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/share/apps/lib/python2.7/subprocess.py", line 1228, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

案例3:因此,遵循官方文档以及SO上的this,以下是一种正确的方法:

echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
out = subprocess.check_output(qsubArgs, stdin=echoProc.stdout)
echoProc.wait()

作业已成功提交,但它返回以下错误消息:

/opt/gridengine/default/spool/compute-2-27/job_scripts/3873705: line 1: echo 3; date; sleep 2; date: command not found

这是我不明白的事情。

案例4:this之后执行此操作的另一种方法是:

echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
qsubProc = subprocess.Popen(qsubArgs, stdin=echoProc.stdout, stdout=subprocess.PIPE)
echoProc.stdout.close()
out = qsubProc.communicate()[0]
echoProc.wait()

此处作业成功提交,但返回以下错误消息:

/opt/gridengine/default/spool/compute-2-32/job_scripts/3873706: line 1: echo 4; date; sleep 2; date: command not found

我的Python代码出错了吗?问题可能来自Python或SGE的编译和安装方式吗?

2 个答案:

答案 0 :(得分:0)

您收到“未找到命令”,因为'echo 3; date; sleep 2; date'被解释为单个命令。

只需更改此行:

echoArgs = ["echo", "-e", "'%s'" % cmd]

为:

echoArgs = ["echo", "-e", "%s" % cmd]

(即,删除单引号。)这应该使案例3和案例4都有效(尽管它会打破1和2)。

答案 1 :(得分:0)

您的具体案例可以在Python 3中实现:

#!/usr/bin/env python3
from subprocess import check_output

queue_name = "all.q"
job_name = "test"
cmd = b"date; sleep 2; date"
job_id = check_output('qsub -cwd -j y -V'.split() +
                      ['-q', queue_name, '-N', job_name],
                      input=cmd).split()[2]

您可以使用Popen.communicate()为Python 2调整它。

根据我的理解,控制输入cmd的人可能已经运行任意命令,因此在这里避免shell=True没有多大意义:

#!/usr/bin/env python
from pipes import quote as shell_quote
from subprocess import check_output

pipeline = 'echo -e {cmd} | qsub -cwd -j y -V -q {queue_name} -N {job_name}'
job_id = check_output(pipeline.format(
    cmd=shell_quote(cmd),
    queue_name=shell_quote(queue_name),
    job_name=shell_quote(job_name)),
                      shell=True).split()[2]

手动实现管道容易出错。如果你不想运行shell;你可以use plumbum module that supports a similar pipeline syntax embedded in pure Python

#!/usr/bin/env python
from plumbum.cmd import echo, qsub # $ pip install plumbum

qsub_args = '-cwd -j y -V -q'.split() + [queue_name, '-N', job_name]
job_id = (echo['-e', cmd] | qsub[qsub_args])().split()[2]
# or (qsub[qsub_args] << cmd)()

请参阅How do I use subprocess.Popen to connect multiple processes by pipes?