Question

我使用以下配置的Google ml引擎训练了模型。

ip

训练后，我从GCP下载了最新的检查点，并使用以下命令导出了模型：

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

我的模型配置如下：

JOB_NAME=object_detection"_$(date +%m_%d_%Y_%H_%M_%S)"
echo $JOB_NAME
gcloud ml-engine jobs submit training $JOB_NAME \
        --job-dir=gs://$1/train \
        --scale-tier BASIC_GPU \
        --runtime-version 1.12 \
        --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
        --module-name object_detection.model_main \
        --region europe-west1 \
        -- \
        --model_dir=gs://$1/train \
        --pipeline_config_path=gs://$1/data/fast_rcnn_resnet101_coco.config

此后，我将模型配置为具有以下配置的ml-engine：

python export_inference_graph.py --input_type encoded_image_string_tensor --pipeline_config_path training/fast_rcnn_resnet101_coco.config --trained_checkpoint_prefix training/model.ckpt-11127 --output_directory exported_graphs

我收到以下错误：

错误

创建版本失败。检测到错误的错误模型，并显示以下错误：“无法加载模型：无法加载可服务的文件：{名称：默认版本：1}失败：未找到：Op类型未在本地主机上运行的二进制文件中注册'FusedBatchNormV3'。确保已在其中注册Op和内核请注意，如果要从tf.contrib加载使用ops的已保存图形，则应在导入图形之前进行访问（例如）The given SavedModel SignatureDef contains the following input(s): inputs['inputs'] tensor_info: dtype: DT_UINT8 shape: (-1, -1, -1, 3) name: image_tensor:0 The given SavedModel SignatureDef contains the following output(s): outputs['detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: detection_boxes:0 outputs['detection_classes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_classes:0 outputs['detection_features'] tensor_info: dtype: DT_FLOAT shape: (-1, -1, -1, -1, -1) name: detection_features:0 outputs['detection_multiclass_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: detection_multiclass_scores:0 outputs['detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300) name: detection_scores:0 outputs['num_detections'] tensor_info: dtype: DT_FLOAT shape: (-1) name: num_detections:0 outputs['raw_detection_boxes'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 4) name: raw_detection_boxes:0 outputs['raw_detection_scores'] tensor_info: dtype: DT_FLOAT shape: (-1, 300, 2) name: raw_detection_scores:0 Method name is: tensorflow/serving/predict，因为当模块时contrib ops会延迟注册首次访问。\ n \ n（错误代码：0）“

Answer 1

很可能是某个地方的Tf版本不兼容，例如在模型和运行时之间。您是否使用实际运行的Tf版本创建了模型？

许多主题似乎证实了我的回答：

Not found: Op type not registered 'CountExtremelyRandomStats'

Bad model deploying to GCP Cloudml

Answer 2

我能够弄清楚这一点：在导出模型时，我使用的是不同的Tensorflow版本。为了保持事物的连贯性并避免此类错误，请确保培训，导出和部署期间的Tensorflow版本都相同。

在Google ml引擎上部署模型时检测到错误模型

2 个答案: