我正在尝试在每个joblib并行进程中将大型对象(〜7GB)保存在pickle二进制文件中。但是,Joblib引发MemoryError。
我有足够的RAM(256GB)和存储(4TB)。我分配了joblib 12个核心。我已经监视了内存,但内存还可以(超过一半的总内存为空)。
代码的结构很简单
import pickle
from joblib import Parallel, delayed
def do_something(arg1, arg2):
...
pickle.dump(save_somthing, open('somefile.p','wb'), protocol=-1)
return 1
JobList = ['a1','b1','c1','d1',\
'a2','b2','c2','d2',\
'a3','b3','c3','d3']
arg2 = 'sth'
Parallel(n_jobs=12)(delayed(do_somthing)(i, 'sth') for i in JobList)
我希望它正常结束我的工作,但是我不知道如何分配(或允许)joblib使用更多的内存
++) 环境 操作系统:Ubuntu 18.04.2(64-bit) Python:Python 3.6.8(GCC 7.3.0)
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 418, in _process_worker
r = call_item()
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/externals/loky/process_executor.py", line 272, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 567, in __call__
return self.func(*args, **kwargs)
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in __call__
for func, args, kwargs in self.items]
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 225, in <listcomp>
for func, args, kwargs in self.items]
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 568, in __call__
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 534, in _cached_call
out, metadata = self.call(*args, **kwargs)
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/memory.py", line 734, in call
output = self.func(*args, **kwargs)
File "02_trj_ConvertToPickle.py", line 65, in to_pickle
configArray = np.zeros((nAtoms,9))
MemoryError
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "02_trj_ConvertToPickle.py", line 106, in <module>
res = Parallel(n_jobs=numCPUcores,verbose=32)(delayed(to_pickle)(i, directoryBufferProcessing) for i in fileList)
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 934, in __call__
self.retrieve()
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/yjw0510/anaconda3/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 521, in wrap_future_result
return future.result(timeout=timeout)
File "/home/yjw0510/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/home/yjw0510/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
MemoryError