Question

在Python 3中，我需要有一个进程池，异步地应用多个worker。

问题是我需要从一系列单独的Python进程中将工作人员“发送”到池中。因此，所有工作人员都应该在相同的Pool实例中执行。

N.B。目标是在不使用所有计算机资源的情况下处理大量数据。

拥有以下multi.py示例代码：

import multiprocessing
from time import sleep

def worker(x):
    sleep(5)
    return x*x

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=int(multiprocessing.cpu_count()/2)) # Using half of the CPU cores

    for i in range(10):
        pool.apply_async(worker, args=(i,))

我需要打开多个multi.py实例，将工作人员附加到同一个池中。

阅读official documentation我无法理解这样做的方法。我知道我需要一个Manager()，但应该如何使用呢？

以Python方式或任何有代码片段的人提出任何建议吗？

谢谢大家。

Answer 1

最后，我能够使用Python 3 BaseManager编写一个有效的基本示例。请在此处查看docs。

在名为server.py的脚本中：

jobs = multiprocessing.Manager().Queue()
BaseManager.register('JobsQueue', callable = lambda: jobs)
m = BaseManager(address=('localhost', 55555), authkey=b'myauthkey')
s = m.get_server()
s.serve_forever()

然后在一个或多个脚本client.py：

BaseManager.register('JobsQueue') # See the difference with the server!
m = BaseManager(address=('localhost', 55555), authkey=b'myauthkey') # Use same authkey! It may work remotely too...
m.connect()
# Then you can put data in the queue
q = m.JobsQueue()
q.put("MY DATA HERE")
# or also
data = q.get()
# etc etc...

显然这是一个基本的例子，但我认为它可以在不使用外部库的情况下完成大量复杂的工作。

今天很多人都在寻求随时可以使用的，通常是大量的图书馆或软件，而不了解基础知识。我不是其中之一......

干杯

在不同的python实例之间共享相同的multiprocessing.Pool对象

1 个答案: