可能相关的问题: Trouble using a lock with multiprocessing.Pool: pickling error
问题: 我有一个scrapper运行,并创建了一个多进程池来提高速度。因为如果发生任何破坏脚本的事情,我想拿起,我用pickle来存储值。我在任何地方添加了一个锁,涉及IO和字典更新。
因此,代码的第一个版本是这样的:
def worker():
pass
total_work = [some list]
remaining_work = [some list]
output_dict = {}
p = multiprocessing.Pool(processes=multiprocessing.cpu_count())
lock = multiprocessing.Lock()
while remaining_work: # in case exception, keep trying
for i in tqdm(p.imap_unordered(worker, remaining_work), total=len(remaining_work)):
lock.acquire()
output_dict.update(i)
with open('cache', 'wb') as f:
pickle.dump(output_dict, f)
with open('cache', 'rb') as f:
current_dict = pickle.load(f)
remaining_work = [s for s in total_work if s not in current_dict.keys()]
lock.release()
然后在一些迭代后得到错误permission denied for opening cache
。我认为问题是多个进程同时执行cache
文件,但我不明白为什么,因为我已经添加了一个锁。然后我读完了这篇文章Trouble using a lock with multiprocessing.Pool: pickling error,因此将我的锁替换为:
p = multiprocessing.Pool(processes=multiprocessing.cpu_count())
m = multiprocessing.Manager()
lock = m.Lock()
仍然会得到同样的错误。
有人可以帮帮我吗?或者更好的方法是什么?
我想要的功能:
tqdm
作为显示总进度的进度条; 非常感谢!