在shell脚本中,我们有以下命令:
/script1.pl < input_file| /script2.pl > output_file
我想使用模块subprocess
在Python中复制上面的流。 input_file
是一个大文件,我无法一次读取整个文件。因此,我想将每一行input_string
传递到管道流中并返回一个字符串变量output_string
,直到整个文件都已流式传输。
以下是第一次尝试:
process = subprocess.Popen(["/script1.pl | /script2.pl"], stdin = subprocess.PIPE, stdout = subprocess.PIPE, shell = True)
process.stdin.write(input_string)
output_string = process.communicate()[0]
但是,使用process.communicate()[0]
关闭流。我想保持流打开以用于将来的流。我尝试使用process.stdout.readline()
,但程序挂起。
答案 0 :(得分:1)
使用Python中的/script1.pl < input_file | /script2.pl > output_file
模块模拟subprocess
shell命令:
#!/usr/bin/env python
from subprocess import check_call
with open('input_file', 'rb') as input_file
with open('output_file', 'wb') as output_file:
check_call("/script1.pl | /script2.pl", shell=True,
stdin=input_file, stdout=output_file)
你可以在没有shell=True
的情况下编写它(虽然我没有看到这里的理由)基于17.1.4.2. Replacing shell pipeline example from the docs:
#!/usr/bin/env python
from subprocess import Popen, PIPE
with open('input_file', 'rb') as input_file
script1 = Popen("/script1.pl", stdin=input_file, stdout=PIPE)
with open("output_file", "wb") as output_file:
script2 = Popen("/script2.pl", stdin=script1.stdout, stdout=output_file)
script1.stdout.close() # allow script1 to receive SIGPIPE if script2 exits
script2.wait()
script1.wait()
您也可以使用plumbum
module to get shell-like syntax in Python:
#!/usr/bin/env python
from plumbum import local
script1, script2 = local["/script1.pl"], local["/script2.pl"]
(script1 < "input_file" | script2 > "output_file")()
另见How do I use subprocess.Popen to connect multiple processes by pipes?
如果要逐行读/写,则答案取决于您要运行的具体脚本。一般情况下,如果由于buffering issues而不小心,很容易发送/接收输入/输出死锁。
如果输入不依赖于您的输出,那么可靠的跨平台方法是为每个流使用单独的线程:
#!/usr/bin/env python
from subprocess import Popen, PIPE
from threading import Thread
def pump_input(pipe):
try:
for i in xrange(1000000000): # generate large input
print >>pipe, i
finally:
pipe.close()
p = Popen("/script1.pl | /script2.pl", shell=True, stdin=PIPE, stdout=PIPE,
bufsize=1)
Thread(target=pump_input, args=[p.stdin]).start()
try: # read output line by line as soon as the child flushes its stdout buffer
for line in iter(p.stdout.readline, b''):
print line.strip()[::-1] # print reversed lines
finally:
p.stdout.close()
p.wait()