我正在尝试使用Python中的subprocess
模块与读取标准输入并以流式方式写入标准输出的进程进行通信。我希望从生成输入的迭代器获取子进程读取行,然后从子进程读取输出行。输入和输出线之间可能没有一对一的对应关系。如何从返回字符串的任意迭代器中提供子进程?
下面是一些示例代码,它提供了一个简单的测试用例,以及我尝试过的某些方法由于某种原因而无法正常工作:
#!/usr/bin/python
from subprocess import *
# A really big iterator
input_iterator = ("hello %s\n" % x for x in xrange(100000000))
# I thought that stdin could be any iterable, but it actually wants a
# filehandle, so this fails with an error.
subproc = Popen("cat", stdin=input_iterator, stdout=PIPE)
# This works, but it first sends *all* the input at once, then returns
# *all* the output as a string, rather than giving me an iterator over
# the output. This uses up all my memory, because the input is several
# hundred million lines.
subproc = Popen("cat", stdin=PIPE, stdout=PIPE)
output, error = subproc.communicate("".join(input_iterator))
output_lines = output.split("\n")
那么当我逐行读取stdout时,如何逐行读取迭代器中的子进程?
答案 0 :(得分:5)
简单的方法似乎是从子进程分叉并提供输入句柄。任何人都可以详细说明这样做的任何可能的缺点吗?或者是否有python模块使其更容易和更安全?
#!/usr/bin/python
from subprocess import *
import os
def fork_and_input(input, handle):
"""Send input to handle in a child process."""
# Make sure input is iterable before forking
input = iter(input)
if os.fork():
# Parent
handle.close()
else:
# Child
try:
handle.writelines(input)
handle.close()
# An IOError here means some *other* part of the program
# crashed, so don't complain here.
except IOError:
pass
os._exit()
# A really big iterator
input_iterator = ("hello %s\n" % x for x in xrange(100000000))
subproc = Popen("cat", stdin=PIPE, stdout=PIPE)
fork_and_input(input_iterator, subproc.stdin)
for line in subproc.stdout:
print line,
答案 1 :(得分:2)
从Python迭代器提供子进程的标准输入:
>>> (rdd.filter(lambda x: "XYZ" == x[1])
... .reduce(lambda x, y: (x[0]+y[0], x[1]))
(5809, 'XYZ')
如果您想同时阅读输出,则需要threads或async.io:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
with Popen("sink", stdin=PIPE, bufsize=-1) as process:
for chunk in input_iterator:
process.stdin.write(chunk)
答案 2 :(得分:0)
关注this recipe它是子进程的附加组件,支持异步I / O.但是,这仍然要求您的子进程使用其输出的一部分响应每个输入行或一组行。