Keras MemoryError:Windows上的alloc失败

时间:2017-01-26 09:16:16

标签: python numpy deep-learning keras

我想训练我的网(图像分类,不幸的是在CPU上),我有71.000条记录:48x48(灰度)图像。 (当我将它保存到numpy数组时,它是1.4 Gb)

几分钟后,我收到以下错误消息:

Epoch 1/50
   3200/57419 [>.............................] - ETA: 5381s - loss: 1.9127 - acc: 0.2338Traceback (most recent call last):
    File "D:/Emotion-Recognition/trainEmotionRecognizer.py", line 68, in <module>
      verbose=1, callbacks=callbacks)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\models.py", line 664, in fit
      sample_weight=sample_weight)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\engine\training.py", line 1143, in fit
      initial_epoch=initial_epoch)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\engine\training.py", line 843, in _fit_loop
      outs = f(ins_batch)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\backend\theano_backend.py", line 919, in __call__
      return self.function(*inputs)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 886, in __call__
      storage_map=getattr(self.fn, 'storage_map', None))
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
      reraise(exc_type, exc_value, exc_trace)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 873, in __call__
      self.fn() if output_subset is None else\
  MemoryError: alloc failed
  Apply node that caused the error: Alloc(TensorConstant{(1L, 1L, 1..1L) of 0.0}, if{shape,inplace}.0, TensorConstant{64}, if{shape,inplace}.2, if{shape,inplace}.3)
  Toposort index: 126
  Inputs types: [TensorType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
  Inputs shapes: [(1L, 1L, 1L, 1L), (), (), (), ()]
  Inputs strides: [(4L, 4L, 4L, 4L), (), (), (), ()]
  Inputs values: [array([[[[ 0.]]]], dtype=float32), array(64L, dtype=int64), array(64L, dtype=int64), array(24L, dtype=int64), array(24L, dtype=int64)]
  Outputs clients: [[if{inplace}(keras_learning_phase, Alloc.0, CorrMM_gradInputs{half, (1, 1), (1, 1)}.0), if{inplace}(keras_learning_phase, CorrMM_gradInputs{half, (1, 1), (1, 1)}.0, Alloc.0)]]

  Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
      term = access_term_cache(node)[idx]
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 965, in access_term_cache
      output_grads = [access_grad_cache(var) for var in node.outputs]
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
      term = access_term_cache(node)[idx]
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 965, in access_term_cache
      output_grads = [access_grad_cache(var) for var in node.outputs]
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
      term = access_term_cache(node)[idx]
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1106, in access_term_cache
      new_output_grads)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gof\op.py", line 700, in L_op
      return self.grad(inputs, output_grads)
    File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\ifelse.py", line 223, in grad
      for i, t in enumerate(ts)])

  HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

我使用的笔记本电脑有 8GB RAM ,当我看到培训运行时的性能时,一段时间后它会使用100%的。

我不知道我现在怎么训练我的网。

这是模型结构

model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu', input_shape=(1,48,48)))
model.add(Dropout(0.3))
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(6, activation='softmax'))

我使用批量大小为 64 且列车图片大小为(1, 48, 48) with dtype = uint8

如何修复此错误并训练我的网络?

2 个答案:

答案 0 :(得分:1)

事实证明 Theano 有内存泄漏。当我尝试使用 Tensorflow 作为Keras后端进行训练时,它成功了。

所以如果您遇到类似问题,请更改后端(注意dim_ordering

答案 1 :(得分:0)

为了减少内存需求,oyu可以减小模型的大小(Dropout也需要内存等于之前的Convolution层),或者可以减少批量大小。无论如何在cpu上训练这样的模型可能需要几天甚至几周。如果可能的话,我强烈建议在gpu上计算一下。