Python多处理concurent写入文件:Queue Vs Pool with global Lock initializer

时间:2016-10-01 18:01:23

标签: python multiprocessing

我有一个令人尴尬的并行问题,并行化的功能不共享内存状态,但需要在csv文件中添加一行。可以按任何顺序在文件中添加行,完整的内容可能需要很长时间,因此我们需要能够读取csv文件的进度。

使用具有全局Lock作为初始化程序的池比使用(如[1]中所述)队列作为由其他工作进程提供的输入以及在csv文件中写入的单个进程是否安全/更好?

[1] Python multiprocessing safely writing to a file

::

from random import random
from time import sleep, time
from multiprocessing import Pool, Lock
import os

def add_to_csv(line,  fd='/tmp/a.csv'):
    pid = os.getpid()
    with lock:
        with open(fd, 'a') as csvfile:
            sleep(1)
            csvfile.write(line)
    print '    line added by {}'.format(pid)

def f(x):
    start = time()
    pid = os.getpid()
    print '=> pi: {} started'.format(pid)
    sleep(6*random())
    res = 2*x
    print 'pi: {} res {} in {:2.2}s'.format(pid, res, time() - start)
    add_to_csv(str(res) + '\n')
    return res

def init(l):
    global lock
    lock = l

if __name__ == '__main__':
    sleep(2)
    lock = Lock()
    pool = Pool(initializer=init, initargs=(lock,))
    out = pool.map(f, [1, 2, 3, 4])
    print out

执行得到这个::

=> pi: 521 started
=> pi: 522 started
=> pi: 523 started
=> pi: 524 started
pi: 521 res 2 in 1.3s
    line added by 521
pi: 523 res 6 in 3.4s
    line added by 523
pi: 524 res 8 in 5.2s
pi: 522 res 4 in 5.4s
    line added by 524
    line added by 522
[2, 4, 6, 8]

0 个答案:

没有答案