如何在Joblib并行进程中使用pickle保存大型python对象

时间:2019-02-18 13:11:40

标签: python pickle joblib

我正在尝试在每个joblib并行进程中将大型对象(〜7GB)保存在pickle二进制文件中。但是,Joblib引发MemoryError。

我有足够的RAM(256GB)和存储(4TB)。我分配了joblib 12个核心。我已经监视了内存,但内存还可以(超过一半的总内存为空)。

代码的结构很简单

import pickle
from joblib import Parallel, delayed

def do_something(arg1, arg2):
    ...
    pickle.dump(save_somthing, open('somefile.p','wb'), protocol=-1)
    return 1

JobList = ['a1','b1','c1','d1',\
           'a2','b2','c2','d2',\
           'a3','b3','c3','d3']
arg2 = 'sth'
Parallel(n_jobs=12)(delayed(do_somthing)(i, 'sth') for i in JobList)

我希望它正常结束我的工作,但是我不知道如何分配(或允许)joblib使用更多的内存

++) 环境 操作系统:Ubuntu 18.04.2(64-bit) Python:Python 3.6.8(GCC 7.3.0)

joblib.externals.loky.process_executor._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
    r = call_item()
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
    return self.fn(*self.args, **self.kwargs)
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in __call__
    return self.func(*args, **kwargs)
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 568, in __call__
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 534, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 734, in call
    output = self.func(*args, **kwargs)
  File "02_trj_ConvertToPickle.py", line 65, in to_pickle
    configArray = np.zeros((nAtoms,9))
MemoryError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "02_trj_ConvertToPickle.py", line 106, in <module>
    res = Parallel(n_jobs=numCPUcores,verbose=32)(delayed(to_pickle)(i, directoryBufferProcessing) for i in fileList)
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
    self.retrieve()
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
    return future.result(timeout=timeout)
  File "/home/yjw0510/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/yjw0510/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
MemoryError

0 个答案:

没有答案