使用multi_gpu_model

时间:2018-01-12 08:48:24

标签: python tensorflow machine-learning export keras

我目前正在将导出的Keras模型导入Tensorflow。代码在顺序模型中运行良好。我能够在python中训练模型,然后将其导入我的c ++应用程序。由于我需要更多的资源,我决定将模型分发到几个GPU上。之后我无法导入模型。

这是我之前创建模型的方式:

  input_img = Input(shape=(imgDim, imgDim, 1)) 

  # add several layers to net

  model = Model(input_img, net)

  model.compile(optimizer='adam', 
                loss='binary_crossentropy',
                metrics=['accuracy'])

  model.fit(x_train, y_train,
            epochs=100,
            batch_size=100,
            shuffle=True,
            validation_data=(x_test, y_test))

  saveKerasModelAsProtobuf(model, outpath)

这是我导出模型的方式:

def saveKerasModelAsProtobuf(model, outputPath):
  signature = tf.saved_model.signature_def_utils.predict_signature_def(
    inputs={'image': model.input}, outputs={'scores': model.output})

  builder = tf.saved_model.builder.SavedModelBuilder(outputPath)                                                                    
  builder.add_meta_graph_and_variables(
    sess=keras.backend.get_session(),                                                                                                                    
    tags=[tf.saved_model.tag_constants.SERVING],                                                                                             
    signature_def_map={
      tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
      signature    
    }
  )         
  builder.save()

  return

这就是我改变代码以在多个GPU上运行的方式:

  input_img = Input(shape=(imgDim, imgDim, 1)) 

  # add several layers to net

  model = Model(input_img, net)

  parallel_model = multi_gpu_model(model, gpus=4)

  parallel_model.compile(optimizer='adam', 
                loss='binary_crossentropy',
                metrics=['accuracy'])

  parallel_model.fit(x_train, y_train,
            epochs=100,
            batch_size=100,
            shuffle=True,
            validation_data=(x_test, y_test))

  # export model rather than parallel_model:
  saveKerasModelAsProtobuf(model, outpath)

当我尝试在单个GPU机器上用C ++导入模型时,我得到以下错误,表明它实际上不是顺序模型(正如我所期望的那样),而是parallel_model:

Cannot assign a device for operation 'replica_3/lambda_4/Shape': Operation was explicitly assigned to /device:GPU:3 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.
 [[Node: replica_3/lambda_4/Shape = Shape[T=DT_FLOAT, _output_shapes=[[4]], out_type=DT_INT32, _device="/device:GPU:3"](input_1)]]

从我读到的内容来看,他们应该分享相同的权重,而不是内部结构。我究竟做错了什么?是否有更好/更通用的方式来导出模型?

谢谢!

0 个答案:

没有答案