加载序列化对象时出现Dill内存错误,如何修复?

时间:2015-09-08 18:01:35

标签: python serialization scikit-learn deserialization dill

加载序列化目标文件时,我收到莳萝/泡菜内存错误。我不太确定发生了什么,我不确定如何解决它。 我打电话的时候:

stat_bundle = train_batch_iterator(clf, TOTAL_TRAINED_EVENTS)

代码跟踪train_batch_iterator函数,在该函数中,它加载序列化对象并使用对象内的数据训练分类器。这是代码:

def train_batch_iterator(clf, tte):
plot_data = [] # initialize plot data array
for file in glob.glob('./SerializedData/Batch8172015_19999/*'):
    with open(file, 'rb') as stream:
        minibatch_train = dill.load(stream)
        clf.partial_fit(minibatch_train.data[1], minibatch_train.target, 
                         classes=np.array([11, 111]))
        tte += len(minibatch_train.target)
        plot_data.append((test_batch_iterator(clf), tte))
return plot_data

这是错误:

Traceback (most recent call last):
  File "LArSoftSGD-version2.0.py", line 154, in <module>
    stat_bundle = train_batch_iterator(clf, TOTAL_TRAINED_EVENTS)
  File "LArSoftSGD-version2.0.py", line 118, in train_batch_iterator
    minibatch_train = dill.load(stream)
  File "/home/jdoe/.local/lib/python3.4/site-packages/dill/dill.py", line 199, in load
    obj = pik.load()
  File "/home/jdoe/.local/lib/python3.4/pickle.py", line 1038, in load
    dispatch[key[0]](self)
  File "/home/jdoe/.local/lib/python3.4/pickle.py", line 1184, in load_binbytes
    self.append(self.read(len))
  File "/home/jdoe/.local/lib/python3.4/pickle.py", line 237, in read
    return self.file_read(n) 
MemoryError

我不知道会出现什么问题。错误似乎在行minibatch_train = dill.load(stream)中,我唯一能想到的是序列化数据文件太大,但文件正好是1161 MB,看起来不够大/不够大导致内存错误。 有人知道可能出现什么问题吗?

0 个答案:

没有答案