Question

假设我有以下多处理结构：

import multiprocessing as mp
def worker(working_queue, output_queue):
    while True:
        if working_queue.empty() == True:
            break 
        else:
            picked = working_queue.get()
            res_item = "Number " + str(picked)
            output_queue.put(res_item)
    return

if __name__ == '__main__':
    static_input = xrange(100)    
    working_q = mp.Queue()
    output_q = mp.Queue()
    results_bank = []
    for i in static_input:
        working_q.put(i)
    processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(2)]
    for proc in processes:
        proc.start()
    for proc in processes:
        proc.join()
    results_bank = []
    while True:
       if output_q.empty() == True:
           break
       results_bank.append(output_q.get_nowait())
    if len(results_bank) == len(static_input):
        print "Good run"
    else:
        print "Bad run"

我的问题：我将如何批量处理＆＃39;将我的结果写入单个文件，而working_queue仍在“工作”状态。（或者至少没有完成）？

注意：我的实际数据结构对于相对于输入的无序结果不敏感（尽管我的示例使用整数）。

另外，我认为从输出队列写入批处理/集合是最好的做法，而不是从不断增长的结果库对象。但是，我对依赖这两种方法的解决方案持开放态度。我不熟悉多处理，因此不确定最佳实践或最有效的解决方案。

Answer 1

如果您希望使用mp.Process es和mp.Queue，这是一种批量处理结果的方法。主要想法是writer函数，下面是：

import itertools as IT
import multiprocessing as mp
SENTINEL = None
static_len = 100

def worker(working_queue, output_queue):
    for picked in iter(working_queue.get, SENTINEL):
        res_item = "Number {:2d}".format(picked)
        output_queue.put(res_item)

def writer(output_queue, threshold=10):
    result_length = 0
    items = iter(output_queue.get, SENTINEL)
    for batch in iter(lambda: list(IT.islice(items, threshold)), []):
        print('\n'.join(batch))
        result_length += len(batch)
    state = 'Good run' if result_length == static_len else 'Bad run'
    print(state)

if __name__ == '__main__':
    num_workers = 2

    static_input = range(static_len)
    working_q = mp.Queue()
    output_q = mp.Queue()

    writer_proc = mp.Process(target=writer, args=(output_q,))
    writer_proc.start()

    for i in static_input:
        working_q.put(i)

    processes = [mp.Process(target=worker, args=(working_q, output_q)) 
                 for i in range(num_workers)]
    for proc in processes:
        proc.start()
        # Put SENTINELs in the Queue to tell the workers to exit their for-loop
        working_q.put(SENTINEL)
    for proc in processes:
        proc.join()

    output_q.put(SENTINEL)
    writer_proc.join()

当传递两个参数时，iter需要一个可调用的和一个标记： iter(callable, sentinel)。重复调用可调用函数（即函数），直到它返回等于sentinel的值。所以

items = iter(output_queue.get, SENTINEL)

将items定义为可迭代的，当迭代时，将从output_queue返回项目直到output_queue.get()返回SENTINEL。

for-loop：

for batch in iter(lambda: list(IT.islice(items, threshold)), []):

重复调用lambda函数，直到返回一个空列表。调用时，lambda函数返回可迭代threshold中最多items个项目的列表。因此，这是一个成语，用于分组n个项而无需填充＆＃34;。有关此习语的更多信息，请参阅this post。

请注意，测试working_q.empty()不是一个好习惯。这可能会导致竞争状况。例如，假设当worker只剩下1个项目时，我们在这些行上有2 working_q个进程：

def worker(working_queue, output_queue):
    while True:
        if working_queue.empty() == True:        <-- Process-1
            break 
        else:
            picked = working_queue.get()         <-- Process-2
            res_item = "Number " + str(picked)
            output_queue.put(res_item)
    return

假设Process-1调用working_queue.empty()，而队列中仍有一个项目。所以它返回False。然后Process-2调用working_queue.get()并获取最后一项。然后Process-1到达行picked = working_queue.get()并挂起，因为队列中没有其他项目。

因此，使用哨兵（如上所示）在for-loop时具体发出信号或while-loop应停止，而不是检查queue.empty()。

Answer 2

没有像“批q.get”那样的操作。但是，一个接一个地放置/弹出一批项目而不是项目是一个好习惯。

multiprocessing.Pool.map正在使用其参数chunksize正在做什么：）

为了尽快编写输出，Pool.imap_unordered会返回一个可迭代而不是列表。

def work(item):
    return "Number " + str(item)

import multiprocessing
static_input = range(100)
chunksize = 10
with multiprocessing.Pool() as pool:
    for out in pool.imap_unordered(work, static_input, chunksize):
        print(out)

如何＆＃34;批量写＆＃34;从输出队列使用多处理？

2 个答案: