并行进程比可用的处理器多

时间:2018-12-27 15:55:31

标签: python parallel-processing pathos

我曾经能够以这种方式运行100个并行进程:

from multiprocessing import Process

def run_in_parallel(some_list):
    proc = []
    for list_element in some_list:
        time.sleep(20)
        p = Process(target=main, args=(list_element,))
        p.start()
        proc.append(p)
    for p in proc:
        p.join()

run_in_parallel(some_list)

但是现在我的输入更加复杂了,并且出现了“那个”泡菜错误。我不得不转向悲痛。

下面的代码最小示例效果很好,但似乎受到线程数量的限制。如何获得最大可扩展至100个并行进程的悲哀?我的cpu只有4个核心。我的进程大多数时候都处于空闲状态,但是必须运行几天。我不介意在其中插入“ time.sleep(20)”进行初始化。

from pathos.multiprocessing import ProcessingPool as Pool

input = zip(itertools.repeat((variable1, variable2, class1), len(some_list)), some_list)

p = Pool()
p.map(main, input)

编辑: 理想情况下,我想执行p = Pool(nodes = len(some_list)),这当然不起作用。

1 个答案:

答案 0 :(得分:0)

我是pathos的作者。我不确定我是否正确解释了您的问题-当您提供了最少的工作代码示例后,解释问题会容易一些。但是...

这是你的意思吗?

>>> def name(x):
...   import multiprocess as mp
...   return mp.process.current_process().name
... 
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool(ncpus=10)
>>> p.map(name, range(10))
['PoolWorker-1', 'PoolWorker-2', 'PoolWorker-3', 'PoolWorker-4', 'PoolWorker-6', 'PoolWorker-5', 'PoolWorker-7', 'PoolWorker-8', 'PoolWorker-9', 'PoolWorker-10']
>>> p.map(name, range(20))
['PoolWorker-1', 'PoolWorker-2', 'PoolWorker-3', 'PoolWorker-4', 'PoolWorker-6', 'PoolWorker-5', 'PoolWorker-7', 'PoolWorker-8', 'PoolWorker-9', 'PoolWorker-10', 'PoolWorker-1', 'PoolWorker-2', 'PoolWorker-3', 'PoolWorker-4', 'PoolWorker-6', 'PoolWorker-5', 'PoolWorker-7', 'PoolWorker-8', 'PoolWorker-9', 'PoolWorker-10']
>>>

然后,例如,如果您想重新配置为仅使用4 cpus,则可以执行以下操作:

>>> p.ncpus = 4      
>>> p.map(name, range(20))
['PoolWorker-11', 'PoolWorker-11', 'PoolWorker-12', 'PoolWorker-12', 'PoolWorker-13', 'PoolWorker-13', 'PoolWorker-14', 'PoolWorker-14', 'PoolWorker-11', 'PoolWorker-11', 'PoolWorker-12', 'PoolWorker-12', 'PoolWorker-13', 'PoolWorker-13', 'PoolWorker-14', 'PoolWorker-14', 'PoolWorker-11', 'PoolWorker-11', 'PoolWorker-12', 'PoolWorker-12']

我担心如果您只有4个核心,但要100路并行,那么您可能无法获得您认为的可扩展性。根据要并行执行功能的时间长短,您可能希望使用另一个pools之一,例如:pathos.threading.ThreadPoolpyina中以MPI为中心的池。

只有4个内核和100个进程的情况是,这4个内核将同时产生100个python实例...因此这可能会严重破坏内存,并且单个内核上的多个python实例将竞争CPU时间...因此最好是稍微进行一下配置,以找到正确的资源超额订购和任何资源闲置的组合。