Python多线程+多处理BrokenPipeError(子进程没有退出?)

时间:2014-10-04 01:13:13

标签: python multithreading multiprocessing

当使用multiprocessing.JoinableQueue生成进程的线程时,我得到BrokenPipeError。它似乎发生在程序完成工作并尝试退出之后,因为它完成它应该做的任何事情。这是什么意思,有没有办法解决这个/安全忽略?

import requests
import multiprocessing
from multiprocessing import JoinableQueue
from queue import Queue
import threading


class ProcessClass(multiprocessing.Process):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.func = func

    def run(self):
        while True:
            arg = self.in_queue.get()
            self.func(arg, self.out_queue)
            self.in_queue.task_done()


class ThreadClass(threading.Thread):
    def __init__(self, func, in_queue, out_queue):
        super().__init__()
        self.in_queue = in_queue
        self.out_queue = out_queue
        self.func = func

    def run(self):
        while True:
            arg = self.in_queue.get()
            self.func(arg, self.out_queue)
            self.in_queue.task_done()


def get_urls(host, out_queue):
    r = requests.get(host)
    out_queue.put(r.text)
    print(r.status_code, host)


def get_title(text, out_queue):
    print(text.strip('\r\n ')[:5])


if __name__ == '__main__':
    def test():

        q1 = JoinableQueue()
        q2 = JoinableQueue()

        for i in range(2):
            t = ThreadClass(get_urls, q1, q2)
            t.daemon = True
            t.setDaemon(True)
            t.start()

        for i in range(2):
            t = ProcessClass(get_title, q2, None)
            t.daemon = True
            t.start()

        for host in ("http://ibm.com", "http://yahoo.com", "http://google.com", "http://amazon.com", "http://apple.com",):
            q1.put(host)

        q1.join()
        q2.join()

    test()
    print('Finished')

节目输出:

200 http://ibm.com
<!DOC
200 http://google.com
<!doc
200 http://yahoo.com
<!DOC
200 http://apple.com
<!DOC
200 http://amazon.com
<!DOC
Finished
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Python\33\lib\multiprocessing\connection.py", line 313, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109]

The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python\33\lib\threading.py", line 901, in _bootstrap_inner
    self.run()
  File "D:\Progs\Uspat\uspat\spider\run\threads_test.py", line 31, in run
    arg = self.in_queue.get()
  File "C:\Python\33\lib\multiprocessing\queues.py", line 94, in get
    res = self._recv()
  File "C:\Python\33\lib\multiprocessing\connection.py", line 251, in recv
    buf = self._recv_bytes()
  File "C:\Python\33\lib\multiprocessing\connection.py", line 322, in _recv_bytes
    raise EOFError
EOFError
....

(为其他线程剪掉相同的错误。)

如果我将JoinableQueue切换到queue.Queue用于多线程部分,那么一切都会修复,但为什么呢?

1 个答案:

答案 0 :(得分:3)

这种情况正在发生,因为当主线程退出时,您在multiprocessing.Queue.get调用中将后台线程阻塞,但它只在某些条件下发生:

  1. 当主线程退出时,守护程序线程正在multiprocessing.Queue.get上运行并阻塞。
  2. multiprocessing.Process正在运行。
  3. multiprocessing上下文不是'fork'
  4. 例外情况是告诉您Connectionmultiprocessing.JoinableQueue来电时发送get()EOF正在收听的Connection的另一端。通常这意味着Connection的另一边已关闭。在关机期间发生这种情况是有道理的 - Python在退出解释器之前清理所有对象,而部分清理涉及关闭所有打开的multiprocessing.Process对象。我还没能弄明白的是,如果multiprocessing.Process已经产生(分叉,这就是为什么它没有&#39},它只会(并且总是)发生的原因;默认情况下发生在Linux上)并且仍在运行。如果我创建一个只在while循环中休眠的Queue,我甚至可以重现它。它根本不需要任何queue.Queue个对象。无论出于何种原因,运行的,生成的子进程的存在似乎保证将引发异常。它可能只会导致事物被破坏的顺序恰好适合竞争条件发生,但这是一个猜测。

    在任何情况下,使用multiprocessing.JoinableQueue代替multiprocessing.Queue是解决问题的好方法,因为您实际上并不需要run。您还可以通过将标记发送到其队列来确保在主线程之前关闭后台线程和/或后台进程。因此,使两个def run(self): for arg in iter(self.in_queue.get, None): # None is the sentinel self.func(arg, self.out_queue) self.in_queue.task_done() self.in_queue.task_done() 方法检查sentinel:

        threads = []
        for i in range(2):
            t = ThreadClass(get_urls, q1, q2)
            t.daemon = True
            t.setDaemon(True)
            t.start()
            threads.append(t)
    
        p = multiprocessing.Process(target=blah)
        p.daemon = True
        p.start()
        procs = []
        for i in range(2):
            t = ProcessClass(get_title, q2, None)
            t.daemon = True
            t.start()
            procs.append(t)
    
        for host in ("http://ibm.com", "http://yahoo.com", "http://google.com", "http://amazon.com", "http://apple.com",):
            q1.put(host)
    
        q1.join()
        # All items have been consumed from input queue, lets start shutting down.
        for t in procs:
            q2.put(None)
            t.join()
        for t in threads:
            q1.put(None)
            t.join()
        q2.join()
    

    然后在你完成后发送哨兵:

    {{1}}