Python - 来自RabbitMQ Consumer的Asynch Multiprocessing

时间:2015-04-14 15:50:46

标签: python multiprocessing rabbitmq python-multiprocessing

我有一个Python程序,充当RabbitMQ的使用者。一旦它从队列中接收到一个作业,我希望程序使用多处理来分割作业,但是我遇到了多处理后勤问题。

我简化了代码的可读性。

我的RabbitMQ消费者功能:

connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()

channel.queue_declare(queue="JobReader", durable=True)
logging.info('Waiting for messages..')


def callback(ch, method, properties, body):
    job_info = json.loads(body)

    logging.info('Start Time: ' + time.strftime("%H:%M:%S"))

    split_jobs = split_job(job_info)

    process_manager.runProcesses(split_jobs)

    ch.basic_ack(delivery_tag=method.delivery_tag)

我的多处理功能:

#!/usr/bin/python

import multiprocessing
import other_package


def worker_process(sub_job):
    other_package.run_job(sub_job)


def runProcesses(jobs):
    processes = []
    for sub_job in jobs:
        p = multiprocessing.Process(target=worker_process, args=(sub_job,))
        processes.append(p)

        p.start()

当然,我不能if __name__ == '__main__':,因为它在一个函数内。

我不确定是否有针对多处理的解决方法,或者我只是以错误的方式接近它。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

您可以重构multiprocessing段,以便从主脚本初始化其状态:

import process_manager
...

def callback(ch, method, properties, body):
    job_info = json.loads(body)
    logging.info('Start Time: ' + time.strftime("%H:%M:%S"))
    split_jobs = split_job(job_info)
    manager.runProcesses(split_jobs)
    ch.basic_ack(delivery_tag=method.delivery_tag)


if __name__ == "__main__":
    manager = process_manager.get_manager()
    connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
    channel = connection.channel()

    channel.queue_declare(queue="JobReader", durable=True)
    logging.info('Waiting for messages..')

然后process_manager看起来像这样:

import multiprocessing
import other_package

def worker_process(sub_job):
    other_package.run_job(sub_job)

_manager = None

def get_manager(): # Note that you don't have to use a singleton here
    global _manager
    if not _manager:
        _manager = Manager()
    return _manager


class Manager(object):
    def __init__(self):
        self._pool = multiprocessing.Pool()

    def runProcesses(self, jobs):
        self._pool.map_async(worker_process, jobs)

请注意,我使用Pool而不是为每个作业生成Process,因为这可能无法很好地扩展。