python多处理 - 在运行的进程上选择类似于查看哪一个已经完成

时间:2015-12-11 19:33:22

标签: python multiprocessing waitpid

我想运行15个命令,但一次只运行3个

testme.py

import multiprocessing
import time
import random
import subprocess

def popen_wrapper(i):
    p = subprocess.Popen( ['echo', 'hi'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = p.communicate()
    print stdout
    time.sleep(randomint(5,20)) #pretend it's doing some work
    return p.returncode

num_to_run = 15
max_parallel = 3

running = []
for i in range(num_to_run):
    p = multiprocessing.Process(target=popen_wrapper, args=(i,))
    running.append(p)
    p.start()

    if len(running) >= max_parallel:
        # blocking wait - join on whoever finishes first then continue
    else:
        # nonblocking wait- see if any processes is finished. If so, join the finished processes

我不确定如何实施以下评论:

if len(running) >= max_parallel:
    # blocking wait - join on whoever finishes first then continue
else:
    # nonblocking wait- see if any processes is finished. If so, join the finished processes

我无法做到这样的事情:

for p in running:
   p.join()

因为第二个进程已经完成,但我仍然在第一个进程被阻止。

问题:你如何检查running中的进程是否在阻塞和非阻塞中完成(找到第一个完成的)?

寻找类似waitpid的东西,也许

1 个答案:

答案 0 :(得分:4)

也许最简单的方法是使用multiprocessing.Pool

pool =  mp.Pool(3)

将设置一个包含3个工作进程的池。然后,您可以将15个任务发送到池中:

for i in range(num_to_run):
    pool.apply_async(popen_wrapper, args=(i,), callback=log_result)

协调3名工人和15项任务所需的所有机器是 由mp.Pool处理。

使用mp.Pool

import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)

def popen_wrapper(i):
    logger.warn('echo "hi"')
    return i

def log_result(retval):
    results.append(retval)

if __name__ == '__main__':

    num_to_run = 15
    max_parallel = 3
    results = []

    pool =  mp.Pool(max_parallel)
    for i in range(num_to_run):
        pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
    pool.close()
    pool.join()

    logger.warn(results)

产量

[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-2] echo "hi"
[WARNING/MainProcess] [0, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12, 14, 13, 1]

日志记录语句显示哪个PoolWorker处理每个任务,最后一个日志记录语句显示MainProcess已从15个popen_wrapper调用中收到返回值。

如果您不想使用资源池,可以为任务设置mp.Queue,为返回值设置mp.Queue

使用mp.Processmp.Queue s

import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)

SENTINEL = None
def popen_wrapper(inqueue, outqueue):
    for i in iter(inqueue.get, SENTINEL):
        logger.warn('echo "hi"')
        outqueue.put(i)

if __name__ == '__main__':

    num_to_run = 15
    max_parallel = 3

    inqueue = mp.Queue()
    outqueue = mp.Queue()
    procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue)) 
             for i in range(max_parallel)]

    for p in procs:
        p.start()
    for i in range(num_to_run):
        inqueue.put(i)
    for i in range(max_parallel):
        # Put sentinels in the queue to tell `popen_wrapper` to quit
        inqueue.put(SENTINEL)

    for p in procs:
        p.join()

    results = [outqueue.get() for i in range(num_to_run)]
    logger.warn(results)

请注意,如果您使用

procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue)) 
         for i in range(max_parallel)]

然后您强制执行max_parallel个(例如3个)工作进程。然后,您将所有15个任务发送到一个队列:

for i in range(num_to_run):
    inqueue.put(i)

让工作人员处理拉出任务的任务:

def popen_wrapper(inqueue, outqueue):
    for i in iter(inqueue.get, SENTINEL):
        logger.warn('echo "hi"')
        outqueue.put(i)

您可能还会感兴趣Doug Hellman's multiprocessing tutorial。在许多有用的示例中,您会发现an ActivePool recipe显示如何生成10个进程并限制它们(使用mp.Semaphore),以便在任何给定时间只有3个处于活动状态。虽然这可能是有益的,但它可能不是您情况下的最佳解决方案,因为似乎没有理由为什么您想要产生超过3个进程。