Question

我有一个程序在一个线程中创建一个子进程，这样线程可以不断检查特定的输出条件（来自stdout或stderr），并调用适当的回调，同时程序的其余部分继续。以下是该代码的简化版本：

import select
import subprocess
import threading

def run_task():
    command = ['python', 'a-script-that-outputs-lines.py']
    proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
    while True:

        ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)

        if proc.stdout in ready:
            next_line_to_process = proc.stdout.readline()
            # process the output

        if proc.stderr in ready:
            next_line_to_process = proc.stderr.readline()
            # process the output

        if not ready and proc.poll() is not None:
            break

thread = threading.Thread(target = run_task)
thread.run()

它工作得相当好，但我希望线程在满足两个条件后退出：正在运行的子进程已经完成，并且stdout和stderr中的所有数据都已处理完毕。

我遇到的困难是，如果我的最后一个条件是如上所述（if not ready and proc.poll() is not None），则线程永远不会退出，因为一旦stdout和stderr的文件描述符被标记为就绪，它们就永远不会变为未准备好（甚至在从它们读取所有数据之后，read()将挂起或readline()将返回空字符串。）

如果我将该条件更改为if proc.poll() is not None，则程序退出时存在循环，并且我不能保证看到所有需要处理的数据。

这只是错误的方法，还是有办法可靠地确定何时读取将要写入文件描述符的所有数据？或者这是一个特定于尝试从子进程的stderr / stdout读取的问题吗？

我一直在Python 2.5上运行（在OS X上运行），并在Python 2.6上尝试基于select.poll()和select.epoll()的变体（在带有2.6内核的Debian上运行）。

Answer 1

正如我上面提到的，我的最终解决方案如下，以防这对任何人都有帮助。我认为这是正确的方法，因为我现在97.2％确定你不能仅使用select() / poll()和read()执行此操作：

import select
import subprocess
import threading

def run_task():
    command = ['python', 'a-script-that-outputs-lines.py']
    proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
    while True:

        ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)

        if proc.stdout in ready:
            next_line_to_process = proc.stdout.readline()
            if next_line_to_process:
                # process the output
            elif proc.returncode is not None:
                # The program has exited, and we have read everything written to stdout
                ready = filter(lambda x: x is not proc.stdout, ready)

        if proc.stderr in ready:
            next_line_to_process = proc.stderr.readline()
            if next_line_to_process:
                # process the output
            elif proc.returncode is not None:
                # The program has exited, and we have read everything written to stderr
                ready = filter(lambda x: x is not proc.stderr, ready)

        if proc.poll() is not None and not ready:
            break

thread = threading.Thread(target = run_task)
thread.run()

Answer 2

如果您想知道是否可以在不阻塞的情况下从管道读取，

select模块是合适的。

要确保您已阅读所有数据，请使用更简单的条件if proc.poll() is not None: break并在循环后调用rest = [pipe.read() for pipe in [p.stdout, p.stderr]]。

子进程在关闭之前不太可能关闭其stdout / stderr，因此为了简单起见，您可以跳过处理EOF的逻辑。

请勿直接致电Thread.run()，而是使用Thread.start()。你可能根本不需要单独的线程。

请勿在{{1}}之后致电p.stdout.readline()，它可能会阻止，而是使用select()。 Empty bytestring表示相应管道的EOF。

作为替代或补充，您可以使用os.read(p.stdout.fileno(), limit)模块使管道无阻塞：

fcntl

并在阅读时处理io / os错误。

Answer 3

您可以在管道的文件描述符上执行原始os.read(fd, size)，而不是使用readline()。这是一个非阻塞操作，它也可以检测EOF（在这种情况下，它返回一个空字符串或字节对象）。您必须自己实施线路分割和缓冲。使用这样的东西：

class NonblockingReader():
  def __init__(self, pipe):
    self.fd = pipe.fileno()
    self.buffer = ""

  def readlines(self):
    data = os.read(self.fd, 2048)
    if not data:
      return None

    self.buffer += data
    if os.linesep in self.buffer:
      lines = self.buffer.split(os.linesep)
      self.buffer = lines[-1]
      return lines[:-1]
    else:
      return []

使用Python的select模块检查是否有更多数据要从文件描述符中读取

3 个答案: