在Google ml引擎上部署模型时检测到错误模型

时间:2019-10-31 12:16:48

标签: google-cloud-ml

我使用以下配置的Google ml引擎训练了模型。

ip

训练后,我从GCP下载了最新的检查点,并使用以下命令导出了模型:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

我的模型配置如下:

JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
        --job-dir=gs://$1/train \
        --scale-tier BASIC_GPU \
        --runtime-version 1.12 \
        --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
        --module-name object_detection.model_main \
        --region europe-west1 \
        -- \
        --model_dir=gs://$1/train \
        --pipeline_config_path=gs://$1/data/fast_rcnn_resnet101_coco.config

此后,我将模型配置为具有以下配置的ml-engine:

python export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path training/fast_rcnn_resnet101_coco.config --trained_checkpoint_prefix training/model.ckpt-11127 --output_directory exported_graphs

我收到以下错误:

错误

创建版本失败。检测到错误的错误模型,并显示以下错误:“无法加载模型:无法加载可服务的文件:{名称:默认版本:1}失败:未找到:Op类型未在本地主机上运行的二进制文件中注册'FusedBatchNormV3'。确保已在其中注册Op和内核请注意,如果要从tf.contrib加载使用ops的已保存图形,则应在导入图形之前进行访问(例如)The given SavedModel SignatureDef contains the following input(s): inputs['inputs'] tensor_info: dtype: DT_UINT8 shape: (-1, -1, -1, 3) name: image_tensor:0 The given SavedModel SignatureDef contains the following output(s): outputs['detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: detection_boxes:0 outputs['detection_classes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_classes:0 outputs['detection_features'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, -1, -1) name: detection_features:0 outputs['detection_multiclass_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: detection_multiclass_scores:0 outputs['detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_scores:0 outputs['num_detections'] tensor_info: dtype: DT_FLOAT shape: (-1) name: num_detections:0 outputs['raw_detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: raw_detection_boxes:0 outputs['raw_detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: raw_detection_scores:0 Method name is: tensorflow/serving/predict ,因为当模块时contrib ops会延迟注册首次访问。\ n \ n(错误代码:0)“

2 个答案:

答案 0 :(得分:2)

很可能是某个地方的Tf版本不兼容,例如在模型和运行时之间。您是否使用实际运行的Tf版本创建了模型?

许多主题似乎证实了我的回答:

Not found: Op type not registered 'CountExtremelyRandomStats'

Bad model deploying to GCP Cloudml

答案 1 :(得分:0)

我能够弄清楚这一点:在导出模型时,我使用的是不同的Tensorflow版本。为了保持事物的连贯性并避免此类错误,请确保培训,导出和部署期间的Tensorflow版本都相同。