我想训练我的网(图像分类,不幸的是在CPU上),我有71.000条记录:48x48(灰度)图像。 (当我将它保存到numpy数组时,它是1.4 Gb)
几分钟后,我收到以下错误消息:
Epoch 1/50
3200/57419 [>.............................] - ETA: 5381s - loss: 1.9127 - acc: 0.2338Traceback (most recent call last):
File "D:/Emotion-Recognition/trainEmotionRecognizer.py", line 68, in <module>
verbose=1, callbacks=callbacks)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\models.py", line 664, in fit
sample_weight=sample_weight)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\engine\training.py", line 1143, in fit
initial_epoch=initial_epoch)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\engine\training.py", line 843, in _fit_loop
outs = f(ins_batch)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\keras\backend\theano_backend.py", line 919, in __call__
return self.function(*inputs)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 886, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gof\link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\compile\function_module.py", line 873, in __call__
self.fn() if output_subset is None else\
MemoryError: alloc failed
Apply node that caused the error: Alloc(TensorConstant{(1L, 1L, 1..1L) of 0.0}, if{shape,inplace}.0, TensorConstant{64}, if{shape,inplace}.2, if{shape,inplace}.3)
Toposort index: 126
Inputs types: [TensorType(float32, (True, True, True, True)), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar), TensorType(int64, scalar)]
Inputs shapes: [(1L, 1L, 1L, 1L), (), (), (), ()]
Inputs strides: [(4L, 4L, 4L, 4L), (), (), (), ()]
Inputs values: [array([[[[ 0.]]]], dtype=float32), array(64L, dtype=int64), array(64L, dtype=int64), array(24L, dtype=int64), array(24L, dtype=int64)]
Outputs clients: [[if{inplace}(keras_learning_phase, Alloc.0, CorrMM_gradInputs{half, (1, 1), (1, 1)}.0), if{inplace}(keras_learning_phase, CorrMM_gradInputs{half, (1, 1), (1, 1)}.0, Alloc.0)]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 965, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 965, in access_term_cache
output_grads = [access_grad_cache(var) for var in node.outputs]
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1272, in access_grad_cache
term = access_term_cache(node)[idx]
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gradient.py", line 1106, in access_term_cache
new_output_grads)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\gof\op.py", line 700, in L_op
return self.grad(inputs, output_grads)
File "C:\Users\Gabor\Anaconda2\lib\site-packages\theano\ifelse.py", line 223, in grad
for i, t in enumerate(ts)])
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
我使用的笔记本电脑有 8GB RAM ,当我看到培训运行时的性能时,一段时间后它会使用100%的。
我不知道我现在怎么训练我的网。
这是模型结构:
model = Sequential()
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu', input_shape=(1,48,48)))
model.add(Dropout(0.3))
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(32, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(64, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(Dropout(0.3))
model.add(Convolution2D(128, 3, 3, border_mode='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(6, activation='softmax'))
我使用批量大小为 64 且列车图片大小为(1, 48, 48) with dtype = uint8
如何修复此错误并训练我的网络?
答案 0 :(得分:1)
事实证明 Theano 有内存泄漏。当我尝试使用 Tensorflow 作为Keras后端进行训练时,它成功了。
所以如果您遇到类似问题,请更改后端(注意dim_ordering
)
答案 1 :(得分:0)
为了减少内存需求,oyu可以减小模型的大小(Dropout也需要内存等于之前的Convolution层),或者可以减少批量大小。无论如何在cpu上训练这样的模型可能需要几天甚至几周。如果可能的话,我强烈建议在gpu上计算一下。