目标:
现状:
因此使用多处理的想法;我希望能够同时写输出而不是CPU绑定,而是I / O绑定。
除了背景,这里是问题(本质上是一个设计问题) - multiprocessing library通过pickling对象然后将数据传递给其他生成的进程来工作;但我试图在WriteWorker进程中使用的ResultProxy对象和共享队列不可选,这导致以下消息(不是逐字,但足够接近):
pickle.PicklingError: Can't pickle object in WriteWorker.start()
所以对你有帮助的人的问题是,关于潜在设计模式或方法的任何想法可以避免这个问题吗?这看起来像一个简单,经典的生产者 - 消费者问题,我想这个解决方案是直截了当的,我只是在思考它
任何帮助或反馈表示赞赏!谢谢:))
修改:这里有一些相关的代码片段,如果有其他任何我可以提供的背景,请告诉我
来自父类的:
#init manager and queues
self.manager = multiprocessing.Manager()
self.query_queue = self.manager.Queue()
self.write_queue = self.manager.Queue()
def _get_data(self):
#spawn a pool of query processes, and pass them query queue instance
for i in xrange(self.NUM_QUERY_THREADS):
qt = QueryWorker.QueryWorker(self.query_queue, self.write_queue, self.config_values, self.args)
qt.daemon = True
# qt.setDaemon(True)
qt.start()
#populate query queue
self.parse_sql_queries()
#spawn a pool of writer processes, and pass them output queue instance
for i in range(self.NUM_WRITE_THREADS):
wt = WriteWorker.WriteWorker(self.write_queue, self.output_path, self.WRITE_BUFFER, self.output_dict)
wt.daemon = True
# wt.setDaemon(True)
wt.start()
#wait on the queues until everything has been processed
self.query_queue.join()
self.write_queue.join()
并从QueryWorker类:
def run(self):
while True:
#grabs host from query queue
query_tupe = self.query_queue.get()
table = query_tupe[0]
query = query_tupe[1]
query_num = query_tupe[2]
if query and table:
#grab connection from pool, run the query
connection = self.engine.connect()
print 'Running query #' + str(query_num) + ': ' + table
try:
result = connection.execute(query)
except:
print 'Error while running query #' + str(query_num) + ': \n\t' + str(query) + '\nError: ' + str(sys.exc_info()[1])
#place result handle tuple into out queue
self.out_queue.put((table, result))
#signals to queue job is done
self.query_queue.task_done()
答案 0 :(得分:1)
简单的答案是避免直接使用ResultsProxy。而是使用cursor.fetchall()或cursor.fetchmany(number_to_fetch)从ResultsProxy获取数据,然后将数据传递到多处理队列。