Question

我想创建一组具有以下结构的进程：

main，它使来自外部来源的请求出列。 main生成可变数量的工作进程。
worker对作业请求进行一些初步处理，然后将数据发送到gpuProc。
gpuProc，接受来自worker进程的作业请求。当它收到足够的请求时，它会将批处理发送到在GPU上运行的进程。在获得结果后，它必须将完成的一批请求发送回worker进程，以便请求它的工作人员接收回来

人们可以设想用许多队列来做这件事。由于worker进程的数量是可变的，如果gpuProc有一个输入队列，worker将其作业请求及其特定的返回队列作为元组，那将是理想的。但是，这是不可能的 - 您只能通过继承在python中共享vanilla队列，而manager.Queues()会失败：

RemoteError: 
---------------------------------------------------------------------------
Unserializable message: ('#RETURN', ('Worker 1 asked proc to do some work.', <Queue.Queue instance at 0x7fa0ba14d908>))
---------------------------------------------------------------------------

有没有pythonic方法可以在不调用某些外部库的情况下执行此操作？

Answer 1

multiprocessing.Queue使用管道，双端队列和线程实现。

当你调用queue.put（）时，对象最终会出现在双端队列中，并且线程负责将其推入管道。

由于显而易见的原因，您无法在进程内共享线程。因此，您需要使用其他东西。

可以轻松共享常规管道和插座。

然而，我宁愿为你的程序使用不同的架构。

main进程将充当协调器，将任务路由到两个不同的进程池，一个用于CPU绑定作业，另一个用于GPU绑定的进程。这意味着您需要在工作人员中共享更多信息，但这种方式更加强大和可扩展。

你得到一份草稿：

from multiprocessing import Pool

def cpu_worker(job_type, data):
    if job_type == "first_computation":
        results do_cpu_work()
    elif job_type == "compute_gpu_results":
        results = do_post_gpu_work()

    return results

def gpu_worker(data):
    return do_gpu_work()

class Orchestrator:
    def __init__(self):
        self.cpu_pool = Pool()
        self.gpu_pool = Pool()

    def new_task(self, task):
        """Entry point for a new task. The task will be run by the CPU workers and the results handled by the cpu_job_done method."""
        self.cpu_pool.apply_async(cpu_worker, args=["first_computation", results], callback=self.cpu_job_done)

    def cpu_job_done(self, results):
        """Once the first CPU computation is done, send its results to a GPU worker. Its results will be handled by the gpu_job_done method."""
        self.gpu_pool.apply_async(gpu_worker, args=[results], callback=self.gpu_job_done)

    def gpu_job_done(self, results):
        """GPU computation done, send the data back for the last CPU computation phase. Results will be handled by the task_done method."""
        self.cpu_pool.apply_async(cpu_worker, args=["compute_gpu_results", results], callback=self.task_done)

    def task_done(self, results):
        """Here you get your final results for the task."""
        print(results)

将完成的作业发送回python中的正确进程

1 个答案: