加载Keras预训练模型时,当我想使用遇到问题的转移学习方法时,我会失败

时间:2019-02-12 10:16:34

标签: keras

我尝试建立Keras预训练的MobileNet模型, 遵循https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

上Keras博客上的指南

这是我的代码:

preTrainedModel = MobileNetV2(weights = 'imagenet', include_top = False)
preFeatures = preTrainedModel.output
preFeatures = GlobalAveragePooling2D()(preFeatures)
preFeatures = Dense(1024, activation = 'relu')(preFeatures)
predictions = Dense(10, activation = 'softmax')(preFeatures)

#Extract features
model = Model(input = preTrainedModel.input, output = predictions)
#Layer freezing
for layer in preTrainedModel.layers:
    layer.trainable = False

if os.path.exists(top_layers_checkpoint_path):
    model.load_weights(top_layers_checkpoint_path)
    print ("Checkpoint '" + top_layers_checkpoint_path + "' loaded.")

#Rmsprop optimizer
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'], )

#Save the model after every epoch
mc_top = ModelCheckpoint(top_layers_checkpoint_path, monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=False, mode='auto', period=1)

#Save the TensorBoard logs.
tb = TensorBoard(log_dir='./logs', histogram_freq=1, write_graph=True, write_images=True)

model.fit_generator(datGen, samples_per_epoch = batchPerEpoch, nb_epoch = epochPerPass, validation_data = validateDataFlow, nb_val_samples = batchPerEpoch, callbacks = [mc_top, tb], use_multiprocessing=False)

for i, layer in enumerate(preTrainedModel.layers):
   print(i, layer.name)
#Save the model after every epoch.
mc_fit = ModelCheckpoint(fine_tuned_checkpoint_path, monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=False, mode='auto', period=1)


if os.path.exists(fine_tuned_checkpoint_path):
        model.load_weights(fine_tuned_checkpoint_path)
        print ("Checkpoint '" + fine_tuned_checkpoint_path + "' loaded.")

# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:50]:
   layer.trainable = False
for layer in model.layers[50:]:
   layer.trainable = True

model.fit_generator(datGen, samples_per_epoch = batchPerEpoch, nb_epoch = epochPerPass, validation_data = validateDataFlow, nb_val_samples = batchPerEpoch, callbacks = [mc_top, tb], use_multiprocessing=False)

但是它报告了问题:

  

tensorflow.python.framework.errors_impl.ResourceExhaustedError:OOM   在分配具有张量[1373,32,112,112]的张量并在类型上浮动时   / job:本地主机/副本:0 /任务:0 /设备:GPU:0通过分配器GPU_0_bfc            [[{{node Conv1 / convolution}} = Conv2D [T = DT_FLOAT,_class = [“ loc:@ bn_Conv1 / cond / FusedBatchNorm / Switch”],data_format =“ NCHW”,膨胀= [1、1、1、1、1 ],padding =“ VALID”,   步幅= [1,1,2,2],use_cudnn_on_gpu = true,   _device =“ / job:localhost / replica:0 / task:0 / device:GPU:0”](Conv1 / convolution-0-TransposeNHWCToNCHW-LayoutOptimizer,   Conv1 /内核/读取)]]

这是否意味着GPU资源已耗尽?我该如何解决该问题?

0 个答案:

没有答案