Python多处理:同步类文件对象

时间:2011-04-28 16:25:42

标签: python multithreading multiprocessing python-2.6 python-multithreading

我正在尝试创建一个像object这样的文件,它在测试期间分配给sys.stdout / sys.stderr以提供确定性输出。它并不意味着快速,可靠。到目前为止我所拥有的几乎有效,但我需要一些帮助来摆脱最后几个边缘错误。

这是我目前的实施。

try:
    from cStringIO import StringIO
except ImportError:
    from StringIO import StringIO

from os import getpid
class MultiProcessFile(object):
    """
    helper for testing multiprocessing

    multiprocessing poses a problem for doctests, since the strategy
    of replacing sys.stdout/stderr with file-like objects then
    inspecting the results won't work: the child processes will
    write to the objects, but the data will not be reflected
    in the parent doctest-ing process.

    The solution is to create file-like objects which will interact with
    multiprocessing in a more desirable way.

    All processes can write to this object, but only the creator can read.
    This allows the testing system to see a unified picture of I/O.
    """
    def __init__(self):
        # per advice at:
        #    http://docs.python.org/library/multiprocessing.html#all-platforms
        from multiprocessing import Queue
        self.__master = getpid()
        self.__queue = Queue()
        self.__buffer = StringIO()
        self.softspace = 0

    def buffer(self):
        if getpid() != self.__master:
            return

        from Queue import Empty
        from collections import defaultdict
        cache = defaultdict(str)
        while True:
            try:
                pid, data = self.__queue.get_nowait()
            except Empty:
                break
            cache[pid] += data
        for pid in sorted(cache):
            self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
    def write(self, data):
        self.__queue.put((getpid(), data))
    def __iter__(self):
        "getattr doesn't work for iter()"
        self.buffer()
        return self.__buffer
    def getvalue(self):
        self.buffer()
        return self.__buffer.getvalue()
    def flush(self):
        "meaningless"
        pass

...还有一个快速测试脚本:

#!/usr/bin/python2.6

from multiprocessing import Process
from mpfile import MultiProcessFile

def printer(msg):
    print msg

processes = []
for i in range(20):
    processes.append( Process(target=printer, args=(i,), name='printer') )

print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer

for p in processes:
    p.start()
for p in processes:
    p.join()

for i in range(20):
    print i,
print

sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print 
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()

这种方法在95%的情况下完美运行,但它有三个边缘问题。我必须在一个快速的while循环中运行测试脚本来重现这些。

  1. 3%的时间,父进程输出未完全反映。我假设这是因为在队列刷新线程可以赶上之前消耗了数据。我没有办法等待没有死锁的线程。
  2. .5%的时候,有一个来自multiprocess.Queue实现的追溯
  3. .01%的时间,PID回绕,因此按PID排序会给出错误的排序。
  4. 在最糟糕的情况下(赔率:7000万分之一),输出看起来像这样:

    START
    
    DONE
    
    302 wrote: '19\n'
    32731 wrote: '0 1 2 3 4 5 6 7 8 '
    32732 wrote: '0\n'
    32734 wrote: '1\n'
    32735 wrote: '2\n'
    32736 wrote: '3\n'
    32737 wrote: '4\n'
    32738 wrote: '5\n'
    32743 wrote: '6\n'
    32744 wrote: '7\n'
    32745 wrote: '8\n'
    32749 wrote: '9\n'
    32751 wrote: '10\n'
    32752 wrote: '11\n'
    32753 wrote: '12\n'
    32754 wrote: '13\n'
    32756 wrote: '14\n'
    32757 wrote: '15\n'
    32759 wrote: '16\n'
    32760 wrote: '17\n'
    32761 wrote: '18\n'
    
    Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
    Traceback (most recent call last):
      File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
      File "/usr/lib/python2.6/threading.py", line 484, in run
          File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
    <type 'exceptions.TypeError'>: 'NoneType' object is not callable
    

    在python2.7中,异常略有不同:

    Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
    Traceback (most recent call last):
      File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
      File "/usr/lib/python2.7/threading.py", line 505, in run
      File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    <type 'exceptions.IOError'>: [Errno 32] Broken pipe
    

    如何摆脱这些边缘情况?

2 个答案:

答案 0 :(得分:9)

解决方案分为两部分。我已成功运行测试程序20万次而没有任何输出变化。

简单的部分是使用multiprocessing.current_process()._ identity来对消息进行排序。这不是已发布API的一部分,但它是每个进程的唯一,确定性标识符。这解决了PID缠绕并给出错误的输出顺序的问题。

解决方案的另一部分是使用multiprocessing.Manager()。Queue()而不是multiprocessing.Queue。这解决了上面的问题#2,因为管理器位于一个单独的进程中,因此在使用拥有进程中的队列时避免了一些不良的特殊情况。 #3是固定的,因为Queue完全耗尽,并且在python开始关闭并关闭stdin之前,馈线线程自然死亡。

答案 1 :(得分:0)

我在使用Python 2.7时遇到的multiprocessing错误远远少于Python 2.6。话虽如此,我用来避免“Exception in thread QueueFeederThread”问题的解决方案是sleep在使用Queue的每个进程中暂时,可能为0.01秒。确实,使用sleep是不可取的,甚至不可靠,但观察到指定的持续时间对我来说在实践中运作得足够好。你也可以尝试0.1秒。