多处理队列与gevent不兼容

时间:2017-05-17 05:56:25

标签: process gevent

这是一个具有多处理和gevent的生产者和工作者工作流程。我想与Process之间的多处理队列共享一些数据。同时,gevent生产者和工人获取数据并将任务放入队列。

task1_producer将生成一些数据并将它们放入q1 task1_worker包含来自任务q1的数据,并将生成的数据放入q2和q3。

然后任务2完成。

但问题是,数据已插入到q3和q4中,但task2没有发生任何事情。如果在task2中添加一些日志,你会发现,q3为空。 为什么会这样?在流程之间共享数据的最佳方法是什么?

from multiprocessing import Value, Process, Queue
#from gevent.queue import Queue
from gevent import monkey, spawn, joinall
monkey.patch_all()  # Magic!
import requests
import json
import time
import logging
from logging.config import fileConfig


def configure():
    logging.basicConfig(level=logging.DEBUG,
                        format="%(asctime)s - %(module)s - line %(lineno)d  - process-id %(process)d -  (%(threadName)-5s)- %(levelname)s - %(message)s")
    # fileConfig(log_file_path)
    return logging
logger = configure().getLogger(__name__)


def task2(q2, q3):
    crawl = task2_class(q2, q3)
    crawl.run()


class task2_class:

    def __init__(self, q2, q3):
        self.q2 = q2
        self.q3 = q3

    def task2_producer(self):
        while not self.q2.empty():
            logger.debug("comment_weibo_id_queue not empty")
            task_q2 = self.q2.get()
            logger.debug("task_q2 is {}".format(task_q2))
            self.q4.put(task_q2)

    def worker(self):
        while not self.q3.empty():
            logger.debug("q3 not empty")
            data_q3 = self.q3.get()
            print(data_q3)

    def run(self):
        spawn(self.task2_producer).join()
        joinall([spawn(self.worker) for _ in range(40)])


def task1(user_id, q1, q2, q3):
    task = task1_class(user_id, q1, q2, q3)
    task.run()


class task1_class:

    def __init__(self, user_id, q1, q2, q3):
        self.user_id = user_id
        self.q1 = q1
        self.q2 = q2
        self.q3 = q3
        logger.debug(self.user_id)

    def task1_producer(self):
        for data in range(20):
            self.q1.put(data)
            logger.debug(
                "{} has been put into q1".format(data))

    def task1_worker(self):
        while not self.q1.empty():
            data = self.q1.get()
            logger.debug("task1_worker data is {}".format(data))
            self.q2.put(data)
            logger.debug(
                "{} has been inserted to q2".format(data))
            self.q3.put(data)
            logger.debug(
                "{} has been inserted to q3".format(data))

    def run(self):
        spawn(self.task1_producer).join()
        joinall([spawn(self.task1_worker) for _ in range(40)])


if __name__ == "__main__":
    q1 = Queue()
    q2 = Queue()
    q3 = Queue()
    p2 = Process(target=task1, args=(
        "user_id", q1, q2, q3,))
    p3 = Process(target=task2, args=(
        q2, q3))
    p2.start()
    p3.start()
    p2.join()
    p3.join()

一些日志

017-05-17 13:46:40,222 - demo - line 78  - process-id 13269 -  (DummyThread-12)- DEBUG - 10 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78  - process-id 13269 -  (DummyThread-13)- DEBUG - 11 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78  - process-id 13269 -  (DummyThread-14)- DEBUG - 12 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78  - process-id 13269 -  (DummyThread-15)- DEBUG - 13 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78  - process-id 13269 -  (DummyThread-16)- DEBUG - 14 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78  - process-id 13269 -  (DummyThread-17)- DEBUG - 15 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78  - process-id 13269 -  (DummyThread-18)- DEBUG - 16 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78  - process-id 13269 -  (DummyThread-19)- DEBUG - 17 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78  - process-id 13269 -  (DummyThread-20)- DEBUG - 18 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78  - process-id 13269 -  (DummyThread-21)- DEBUG - 19 has been inserted to q3
[Finished in 0.4s]

1 个答案:

答案 0 :(得分:2)

gevent的patch_allmultiprocessing.Queue不兼容。具体而言,patch_all默认调用patch_threadpatch_thread is documented to have issues with multiprocessing.Queue

如果您想使用multiprocessing.Queue,可以将thread=False作为参数传递给patch_all,或者只使用您需要的特定补丁功能,例如patch_socket() 。 (这假设您不需要猴子修补的线程,当然,您的示例不使用。)

或者,您可以考虑像Redis这样的外部队列,或者直接在(unix,可能)套接字上传递数据(这是multiprocessing.Queue所做的事情)。不可否认,两者都比较复杂。