我有一个令人尴尬的并行问题,并行化的功能不共享内存状态,但需要在csv文件中添加一行。可以按任何顺序在文件中添加行,完整的内容可能需要很长时间,因此我们需要能够读取csv文件的进度。
使用具有全局Lock作为初始化程序的池比使用(如[1]中所述)队列作为由其他工作进程提供的输入以及在csv文件中写入的单个进程是否安全/更好?
[1] Python multiprocessing safely writing to a file
::
from random import random
from time import sleep, time
from multiprocessing import Pool, Lock
import os
def add_to_csv(line, fd='/tmp/a.csv'):
pid = os.getpid()
with lock:
with open(fd, 'a') as csvfile:
sleep(1)
csvfile.write(line)
print ' line added by {}'.format(pid)
def f(x):
start = time()
pid = os.getpid()
print '=> pi: {} started'.format(pid)
sleep(6*random())
res = 2*x
print 'pi: {} res {} in {:2.2}s'.format(pid, res, time() - start)
add_to_csv(str(res) + '\n')
return res
def init(l):
global lock
lock = l
if __name__ == '__main__':
sleep(2)
lock = Lock()
pool = Pool(initializer=init, initargs=(lock,))
out = pool.map(f, [1, 2, 3, 4])
print out
执行得到这个::
=> pi: 521 started
=> pi: 522 started
=> pi: 523 started
=> pi: 524 started
pi: 521 res 2 in 1.3s
line added by 521
pi: 523 res 6 in 3.4s
line added by 523
pi: 524 res 8 in 5.2s
pi: 522 res 4 in 5.4s
line added by 524
line added by 522
[2, 4, 6, 8]