如何从subprocess.Popen.stdout(非阻塞)中读取所有可用数据?

时间:2010-06-19 17:34:50

标签: python subprocess

我需要一种方法来读取Popen创建的流中所有当前可用的字符,或者找出缓冲区中剩余的字符数。

底色: 我想用Python远程控制交互式应用程序。到目前为止,我使用Popen创建了一个新的子流程:

process=subprocess.Popen(["python"],shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE, cwd=workingDir)

(我不是真正启动python,但实际的交互式界面是相似的。) 此刻我读取1个字节,直到我检测到进程已到达命令提示符:

output = ""
while output[-6:]!="SCIP> ":
    output += process.stdout.read(1)
    sys.stdout.write(output[-1])
return output

然后我通过process.stdin.write("command\n")开始一个冗长的计算。 我的问题是,我无法检查计算是否已经完成,因为我无法检查流中的最后一个字符是否是提示。 read()read(n)阻止我的线程,直到它到达EOF,它永远不会,因为交互式程序在被告知之前不会结束。以上述循环方式查找提示也不起作用,因为提示只会在计算后发生。

理想的解决方案是允许我从流中读取所有可用的字符,并立即返回一个空字符串,如果没有什么可读的。

6 个答案:

答案 0 :(得分:11)

Popen stdout的增量解析确实不是问题。只需将一个管道插入一个线程并让它擦除输出,寻找分隔符。根据您的偏好,它可以将其传输到另一个管道/文件中,或者将解析后的“块”放在异步模式下的“堆栈”中。以下是基于自定义分隔符的stdout异步“分块”的示例:

import cStringIO
import uuid
import threading
import os

class InputStreamChunker(threading.Thread):
    '''
    Threaded object / code that mediates reading output from a stream,
    detects "separation markers" in the stream and spits out chunks
    of original stream, split when ends of chunk are encountered.

    Results are made available as a list of filled file-like objects
    (your choice). Results are accessible either "asynchronously"
    (you can poll at will for results in a non-blocking way) or
    "synchronously" by exposing a "subscribe and wait" system based
    on threading.Event flags.

    Usage:
    - instantiate this object
    - give our input pipe as "stdout" to other subprocess and start it:
        Popen(..., stdout = th.input, ...)
    - (optional) subscribe to data_available event
    - pull resulting file-like objects off .data
      (if you are "messing" with .data from outside of the thread,
       be curteous and wrap the thread-unsafe manipulations between:
       obj.data_unoccupied.clear()
       ... mess with .data
       obj.data_unoccupied.set()
       The thread will not touch obj.data for the duration and will
       block reading.)

    License: Public domain
    Absolutely no warranty provided
    '''
    def __init__(self, delimiter = None, outputObjConstructor = None):
        '''
        delimiter - the string that will be considered a delimiter for the stream
        outputObjConstructor - instanses of these will be attached to self.data array
         (intantiator_pointer, args, kw)
        '''
        super(InputStreamChunker,self).__init__()

        self._data_available = threading.Event()
        self._data_available.clear() # parent will .wait() on this for results.
        self._data = []
        self._data_unoccupied = threading.Event()
        self._data_unoccupied.set() # parent will set this to true when self.results is being changed from outside
        self._r, self._w = os.pipe() # takes all inputs. self.input = public pipe in.
        self._stop = False
        if not delimiter: delimiter = str(uuid.uuid1())
        self._stream_delimiter = [l for l in delimiter]
        self._stream_roll_back_len = ( len(delimiter)-1 ) * -1
        if not outputObjConstructor:
            self._obj = (cStringIO.StringIO, (), {})
        else:
            self._obj = outputObjConstructor
    @property
    def data_available(self):
        '''returns a threading.Event instance pointer that is
        True (and non-blocking to .wait() ) when we attached a
        new IO obj to the .data array.
        Code consuming the array may decide to set it back to False
        if it's done with all chunks and wants to be blocked on .wait()'''
        return self._data_available
    @property
    def data_unoccupied(self):
        '''returns a threading.Event instance pointer that is normally
        True (and non-blocking to .wait() ) Set it to False with .clear()
        before you start non-thread-safe manipulations (changing) .data
        array. Set it back to True with .set() when you are done'''
        return self._data_unoccupied
    @property
    def data(self):
        '''returns a list of input chunkes (file-like objects) captured
        so far. This is a "stack" of sorts. Code consuming the chunks
        would be responsible for disposing of the file-like objects.
        By default, the file-like objects are instances of cStringIO'''
        return self._data
    @property
    def input(self):
        '''This is a file descriptor (not a file-like).
        It's the input end of our pipe which you give to other process
        to be used as stdout pipe for that process'''
        return self._w
    def flush(self):
        '''Normally a read on a pipe is blocking.
        To get things moving (make the subprocess yield the buffer,
        we inject our chunk delimiter into self.input

        This is useful when primary subprocess does not write anything
        to our in pipe, but we need to make internal pipe reader let go
        of the pipe and move on with things.
        '''
        os.write(self._w, ''.join(self._stream_delimiter))
    def stop(self):
        self._stop = True
        self.flush() # reader has its teeth on the pipe. This makes it let go for for a sec.
        os.close(self._w)
        self._data_available.set()
    def __del__(self):
        try:
            self.stop()
        except:
            pass
        try:
            del self._w
            del self._r
            del self._data
        except:
            pass
    def run(self):
        ''' Plan:
        - We read into a fresh instance of IO obj until marker encountered.
        - When marker is detected, we attach that IO obj to "results" array
          and signal the calling code (through threading.Event flag) that
          results are available
        - repeat until .stop() was called on the thread.
        '''
        marker = ['' for l in self._stream_delimiter] # '' is there on purpose
        tf = self._obj[0](*self._obj[1], **self._obj[2])
        while not self._stop:
            l = os.read(self._r, 1)
            print('Thread talking: Ordinal of char is:%s' %ord(l))
            trash_str = marker.pop(0)
            marker.append(l)
            if marker != self._stream_delimiter:
                tf.write(l)
            else:
                # chopping off the marker first
                tf.seek(self._stream_roll_back_len, 2)
                tf.truncate()
                tf.seek(0)
                self._data_unoccupied.wait(5) # seriously, how much time is needed to get your items off the stack?
                self._data.append(tf)
                self._data_available.set()
                tf = self._obj[0](*self._obj[1], **self._obj[2])
        os.close(self._r)
        tf.close()
        del tf

def waitforresults(ch, answers, expect):
    while len(answers) < expect:
        ch.data_available.wait(0.5); ch.data_unoccupied.clear()
        while ch.data:
            answers.append(ch.data.pop(0))
        ch.data_available.clear(); ch.data_unoccupied.set()
        print('Main talking: %s answers received, expecting %s\n' % ( len(answers), expect) )

def test():
    '''
    - set up chunker
    - set up Popen with chunker's output stream
    - push some data into proc.stdin
    - get results
    - cleanup
    '''

    import subprocess

    ch = InputStreamChunker('\n')
    ch.daemon = True
    ch.start()

    print('starting the subprocess\n')
    p = subprocess.Popen(
        ['cat'],
        stdin = subprocess.PIPE,
        stdout = ch.input,
        stderr = subprocess.PIPE)

    answers = []

    i = p.stdin
    i.write('line1 qwer\n') # will be in results
    i.write('line2 qwer\n') # will be in results
    i.write('line3 zxcv asdf') # will be in results only after a ch.flush(),
                                # prepended to other line or when the pipe is closed
    waitforresults(ch, answers, expect = 2)

    i.write('line4 tyui\n') # will be in results
    i.write('line5 hjkl\n') # will be in results
    i.write('line6 mnbv') # will be in results only after a ch.flush(),
                                # prepended to other line or when the pipe is closed
    waitforresults(ch, answers, expect = 4)

    ## now we will flush the rest of input (that last line did not have a delimiter)
    i.close()
    ch.flush()
    waitforresults(ch, answers, expect = 5)

    should_be = ['line1 qwer', 'line2 qwer',
        'line3 zxcv asdfline4 tyui', 'line5 hjkl', 'line6 mnbv']
    assert should_be == [i.read() for i in answers]

    # don't forget to stop the chunker. It it closes the pipes
    p.terminate()
    ch.stop()
    del p, ch

if __name__ == '__main__':
    test()

编辑:删除了关于“写入proc的stdin是一次性的事情”的错误措辞

答案 1 :(得分:4)

四处寻找我发现这个非常好的解决方案

Persistent python subprocess

通过使用fcntl将子进程管道上的文件属性设置为非阻塞模式,不需要辅助线程或轮询来避免阻塞问题。我可能会遗漏一些东西,但它解决了我的交互式过程控制问题。

答案 2 :(得分:2)

还有另一种可能的解决方案,但可能需要重新安排程序。

如果您有多个I / O源(文件描述符,套接字等),并且您希望一次等待所有这些I / O,请使用Python select模块。您可以(例如)在列表中放置标准输入(用于从终端读取)和管道(从子进程),并等待输入在其中任何一个上准备就绪。 select阻塞,直到I / O在列表中的任何描述符上可用。然后扫描列表,查找具有可用数据的列表。

这种方法效率非常高 - 远远超过轮询文件描述符以查看是否有任何数据。它还具有简单的优点;也就是说,您可以使用最少的代码完成您想要的任务。更简单的代码意味着更少的错误机会。

答案 3 :(得分:1)

read()阻塞直到EOF是不正确的 - 它会阻塞直到它获得所需的足够数据 - 而另一方面可能是某些数据保存在缓冲区中(它不会因为你结束打印而被刷新)新线)。

为什么不尝试让孩子打印"### OVER ###\n"然后stdout.flush()之类的东西,然后在父方收集,直到你看到OVER标记,比如''.join(i for i in iter(process.stdout.readline, '### OVER ###\n'))

答案 4 :(得分:1)

我尝试了很多方法,例如通过以下方式制作非阻塞stdout

fd = output.fileno()
fl = fcntl.fcntl(fd, fcntl.F_GETFL)
fcntl.fcntl(fd, fcntl.F_SETFL, fl | os.O_NONBLOCK)

但唯一可行的解​​决方案是here

master, slave = pty.openpty()

proc = subprocess.Popen(
    shlex.split(command), 
    stdout=slave, 
    stderr=slave, 
    close_fds=True, 
    bufsize=0
)

stdout = os.fdopen(master)

然后:

while True:
    out = stdout.readline()
    output_result = proc.poll()
    if out == '' and output_result is not None:
        break
    if out != '':
        print(out)

答案 5 :(得分:0)

我认为readline()不会阻止你的过程。

line = process.stdout.readline()

之前我曾尝试使用

for line in process.stdout:
    print(line)

但这似乎一直挂起,直到该过程终止。