这是一个具有多处理和gevent的生产者和工作者工作流程。我想与Process之间的多处理队列共享一些数据。同时,gevent生产者和工人获取数据并将任务放入队列。
task1_producer将生成一些数据并将它们放入q1 task1_worker包含来自任务q1的数据,并将生成的数据放入q2和q3。
然后任务2完成。
但问题是,数据已插入到q3和q4中,但task2没有发生任何事情。如果在task2中添加一些日志,你会发现,q3为空。 为什么会这样?在流程之间共享数据的最佳方法是什么?
from multiprocessing import Value, Process, Queue
#from gevent.queue import Queue
from gevent import monkey, spawn, joinall
monkey.patch_all() # Magic!
import requests
import json
import time
import logging
from logging.config import fileConfig
def configure():
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s - %(module)s - line %(lineno)d - process-id %(process)d - (%(threadName)-5s)- %(levelname)s - %(message)s")
# fileConfig(log_file_path)
return logging
logger = configure().getLogger(__name__)
def task2(q2, q3):
crawl = task2_class(q2, q3)
crawl.run()
class task2_class:
def __init__(self, q2, q3):
self.q2 = q2
self.q3 = q3
def task2_producer(self):
while not self.q2.empty():
logger.debug("comment_weibo_id_queue not empty")
task_q2 = self.q2.get()
logger.debug("task_q2 is {}".format(task_q2))
self.q4.put(task_q2)
def worker(self):
while not self.q3.empty():
logger.debug("q3 not empty")
data_q3 = self.q3.get()
print(data_q3)
def run(self):
spawn(self.task2_producer).join()
joinall([spawn(self.worker) for _ in range(40)])
def task1(user_id, q1, q2, q3):
task = task1_class(user_id, q1, q2, q3)
task.run()
class task1_class:
def __init__(self, user_id, q1, q2, q3):
self.user_id = user_id
self.q1 = q1
self.q2 = q2
self.q3 = q3
logger.debug(self.user_id)
def task1_producer(self):
for data in range(20):
self.q1.put(data)
logger.debug(
"{} has been put into q1".format(data))
def task1_worker(self):
while not self.q1.empty():
data = self.q1.get()
logger.debug("task1_worker data is {}".format(data))
self.q2.put(data)
logger.debug(
"{} has been inserted to q2".format(data))
self.q3.put(data)
logger.debug(
"{} has been inserted to q3".format(data))
def run(self):
spawn(self.task1_producer).join()
joinall([spawn(self.task1_worker) for _ in range(40)])
if __name__ == "__main__":
q1 = Queue()
q2 = Queue()
q3 = Queue()
p2 = Process(target=task1, args=(
"user_id", q1, q2, q3,))
p3 = Process(target=task2, args=(
q2, q3))
p2.start()
p3.start()
p2.join()
p3.join()
一些日志
017-05-17 13:46:40,222 - demo - line 78 - process-id 13269 - (DummyThread-12)- DEBUG - 10 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78 - process-id 13269 - (DummyThread-13)- DEBUG - 11 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78 - process-id 13269 - (DummyThread-14)- DEBUG - 12 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78 - process-id 13269 - (DummyThread-15)- DEBUG - 13 has been inserted to q3
2017-05-17 13:46:40,222 - demo - line 78 - process-id 13269 - (DummyThread-16)- DEBUG - 14 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78 - process-id 13269 - (DummyThread-17)- DEBUG - 15 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78 - process-id 13269 - (DummyThread-18)- DEBUG - 16 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78 - process-id 13269 - (DummyThread-19)- DEBUG - 17 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78 - process-id 13269 - (DummyThread-20)- DEBUG - 18 has been inserted to q3
2017-05-17 13:46:40,223 - demo - line 78 - process-id 13269 - (DummyThread-21)- DEBUG - 19 has been inserted to q3
[Finished in 0.4s]
答案 0 :(得分:2)
gevent的patch_all
与multiprocessing.Queue
不兼容。具体而言,patch_all
默认调用patch_thread
,patch_thread is documented to have issues with multiprocessing.Queue。
如果您想使用multiprocessing.Queue
,可以将thread=False
作为参数传递给patch_all
,或者只使用您需要的特定补丁功能,例如patch_socket()
。 (这假设您不需要猴子修补的线程,当然,您的示例不使用。)
或者,您可以考虑像Redis这样的外部队列,或者直接在(unix,可能)套接字上传递数据(这是multiprocessing.Queue
所做的事情)。不可否认,两者都比较复杂。