多处理。在“读者”进程死亡之后,队列死锁

时间:2014-01-25 11:19:06

标签: python python-2.7 queue multiprocessing

我一直在玩多处理程序包,并注意到在以下情况下队列可能会被解锁:

  1. “阅读器”流程正在使用get 超时> 0:

    self.queue.get(timeout=3)
    
  2. “读者”因{em>超时导致get阻塞而死亡。

  3. 该队列永远被锁定。

    证明问题的应用程序

    我创建了两个子进程“Worker”(进入队列)和“Receiver”(从队列中获取)。此外,父母流程会定期检查他的孩子是否are alive并在需要时开始新孩子。

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import multiprocessing
    import procname
    import time
    
    class Receiver(multiprocessing.Process):
        ''' Reads from queue with 3 secs timeout '''
    
        def __init__(self, queue):
            multiprocessing.Process.__init__(self)
            self.queue = queue
    
        def run(self):
            procname.setprocname('Receiver')
            while True:
                try:
                    msg = self.queue.get(timeout=3)
                    print '<<< `{}`, queue rlock: {}'.format(
                        msg, self.queue._rlock)
                except multiprocessing.queues.Empty:
                    print '<<< EMPTY, Queue rlock: {}'.format(
                        self.queue._rlock)
                    pass
    
    
    class Worker(multiprocessing.Process):
        ''' Puts into queue with 1 sec sleep '''
    
        def __init__(self, queue):
            multiprocessing.Process.__init__(self)
            self.queue = queue
    
        def run(self):
            procname.setprocname('Worker')
            while True:
                time.sleep(1)
                print 'Worker: putting msg, Queue size: ~{}'.format(
                    self.queue.qsize())
                self.queue.put('msg from Worker')
    
    
    if __name__ == '__main__':
        queue = multiprocessing.Queue()
    
        worker = Worker(queue)
        worker.start()
    
        receiver = Receiver(queue)
        receiver.start()
    
        while True:
            time.sleep(1)
            if not worker.is_alive():
                print 'Restarting worker'
                worker = Worker(queue)
                worker.start()
            if not receiver.is_alive():
                print 'Restarting receiver'
                receiver = Receiver(queue)
                receiver.start()
    

    ps

    中进程树的外观
    bash
     \_ python queuetest.py
         \_ Worker
         \_ Receiver
    

    控制台输出

    $ python queuetest.py
    Worker: putting msg, Queue size: ~0
    <<< `msg from Worker`, queue rlock: <Lock(owner=None)>
    Worker: putting msg, Queue size: ~0
    <<< `msg from Worker`, queue rlock: <Lock(owner=None)>
    Restarting receiver                        <-- killed Receiver with SIGTERM
    Worker: putting msg, Queue size: ~0
    Worker: putting msg, Queue size: ~1
    Worker: putting msg, Queue size: ~2
    <<< EMPTY, Queue rlock: <Lock(owner=SomeOtherProcess)>
    Worker: putting msg, Queue size: ~3
    Worker: putting msg, Queue size: ~4
    Worker: putting msg, Queue size: ~5
    <<< EMPTY, Queue rlock: <Lock(owner=SomeOtherProcess)>
    Worker: putting msg, Queue size: ~6
    Worker: putting msg, Queue size: ~7
    

    有没有办法绕过这个?使用get_nowait结合睡眠似乎是某种解决方法,但它不会“按原样”读取数据。

    系统信息

    $ uname -sr
    Linux 3.11.8-200.fc19.x86_64
    
    $ python -V
    Python 2.7.5
    
    In [3]: multiprocessing.__version__
    Out[3]: '0.70a1'
    

    “它只是工作”解决方案

    在写这个问题时,我想到了对Receiver类的一些愚蠢的修改:

    class Receiver(multiprocessing.Process):
    
        def __init__(self, queue):
            multiprocessing.Process.__init__(self)
            self.queue = queue
    
        def run(self):
            procname.setprocname('Receiver')
            while True:
                time.sleep(1)
                while True:
                    try:
                        msg = self.queue.get_nowait()
                        print '<<< `{}`, queue rlock: {}'.format(
                            msg, self.queue._rlock)
                    except multiprocessing.queues.Empty:
                        print '<<< EMPTY, Queue rlock: {}'.format(
                            self.queue._rlock)
                        break
    

    但这对我来说似乎不太好。

1 个答案:

答案 0 :(得分:2)

这可能是因为来自 Queue.get()的* not_empty.release()*永远不会发生(进程已被杀死)。您是否尝试在Receiver中捕获TERM信号并在退出之前释放Queue互斥锁?