使用Keras初始化Xception时出现内存错误

时间:2017-08-28 21:07:17

标签: tensorflow out-of-memory keras gpu

我很难在新的类集上实现二阶分类的预训练Xception模型。从以下函数成功返回模型:

#adapted from:
#https://github.com/fchollet/keras/issues/4465

from keras.applications.xception import Xception
from keras.layers import Input, Flatten, Dense
from keras.models import Model

def get_xception(in_shape,trn_conv):
  #Get back the convolutional part of Xception trained on ImageNet
  model = Xception(weights='imagenet', include_top=False)

  #Here the input images have been resized to 299x299x3, so this is the
  #same as Xception's native input
  input = Input(in_shape,name = 'image_input')

  #Use the generated model 
  output = model(input)

  #Only train the top fully connected layers (keep pre-trained feature extractors)
  for layer in model.layers:
      layer.trainable = False
  #Add the fully-connected layers 
  x = Flatten(name='flatten')(output)
  x = Dense(2048, activation='relu', name='fc1')(x)
  x = Dense(2048, activation='relu', name='fc2')(x)
  x = Dense(2, activation='softmax', name='predictions')(x)

  #Create your own model 
  my_model = Model(input=input, output=x)
  my_model.compile(loss='binary_crossentropy', optimizer='SGD')

return my_model

这很好,但是当我运行这段代码时:

model=get_xception(shp,trn_feat)
in_data=HDF5Matrix(str_trn,'/inputs')
labels=HDF5Matrix(str_trn,'/labels')
model.fit(in_data,labels,shuffle="batch")

我收到以下错误:

File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/keras/engine/training.py", line 1576, in fit
  self._make_train_function()
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/keras/engine/training.py", line 960, in _make_train_function
  loss=self.total_loss)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
  return func(*args, **kwargs)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/keras/optimizers.py", line 169, in get_updates
  v = self.momentum * m - lr * g  # velocity
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/ops/variables.py", line 705, in _run_op
  return getattr(ops.Tensor, operator)(a._AsTensor(), *args)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper
  return func(x, y, name=name)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 1088, in _mul_dispatch
  return gen_math_ops._mul(x, y, name=name)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1449, in _mul
  result = _op_def_lib.apply_op("Mul", x=x, y=y, name=name)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
  op_def=op_def)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
  original_op=self._default_original_op, op_def=op_def)
File "/home/tsmith/.virtualenvs/keras/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
  self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[204800,2048]
  [[Node: training/SGD/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](SGD/momentum/read, training/SGD/Variable/read)]]

我一直在跟踪函数调用几个小时,但仍然无法弄清楚发生了什么。该系统应远远超出要求。系统规格:

Ubuntu Version: 14.04.5 LTS
Tensorflow Version: 1.3.0
Keras Version: 2.0.7
28x dual core Inten Xeon processor (1.2 GHz)
4x NVidia GeForce 1080 (8Gb memory each)

关于这里出了什么问题的任何线索?

1 个答案:

答案 0 :(得分:0)

Per Yu-Yang,最简单的解决方案是减少批量,之后一切运行良好!