Question

我能够通过rhaertel80使用此脚本创建一个已保存的模型，将诗人的张量流部署到云端ml引擎上

import tensorflow as tf
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model import builder as saved_model_builder

input_graph = 'retrained_graph.pb'
saved_model_dir = 'my_model'

with tf.Graph().as_default() as graph:
  # Read in the export graph
  with tf.gfile.FastGFile(input_graph, 'rb') as f:
      graph_def = tf.GraphDef()
      graph_def.ParseFromString(f.read())
      tf.import_graph_def(graph_def, name='')

  # Define SavedModel Signature (inputs and outputs)
  in_image = graph.get_tensor_by_name('DecodeJpeg/contents:0')
  inputs = {'image_bytes': tf.saved_model.utils.build_tensor_info(in_image)}

  out_classes = graph.get_tensor_by_name('final_result:0')
  outputs = {'prediction': tf.saved_model.utils.build_tensor_info(out_classes)}

  signature = tf.saved_model.signature_def_utils.build_signature_def(
      inputs=inputs,
      outputs=outputs,
      method_name='tensorflow/serving/predict'
  )

  with tf.Session(graph=graph) as sess:
    # Save out the SavedModel.
    b = saved_model_builder.SavedModelBuilder(saved_model_dir)
    b.add_meta_graph_and_variables(sess,
                               [tf.saved_model.tag_constants.SERVING],
                               signature_def_map={'serving_default': signature})
    b.save()

诗人的当前版本的tensorflow使用的是不能使用上述脚本的mobilenet架构，我使用默认的inceptionv3，没有指定架构，然后运行上面的脚本，该脚本成功运行。然后我将上面的savedmodel上传到我的存储桶，并从控制台创建了一个新的模型和版本，并将目录指定到我的存储桶并使用运行时版本1.5。

在成功部署我的模型后，我写了一个简短的脚本来测试我的模型，如下所示：

from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
from googleapiclient import errors

# Store your full project ID in a variable in the format the API needs.
projectID = 'projects/{}'.format('edocoto-186909')

# Build a representation of the Cloud ML API.
ml = discovery.build('ml', 'v1')

# Create a dictionary with the fields from the request body.
name1 = 'projects/{}/models/{}'.format('edocoto-186909','flower_inception')

# Create a request to call projects.models.create.
request = ml.projects().predict(
    name=name1,
    body={'instances': [{'image_bytes': {'b64': b64imagedata }, 'key': '0'}]})  
print (request)

# Make the call.
try:
    response = request.execute()
    print(response)
except errors.HttpError as err:
    # Something went wrong, print out some information.
    print('There was an error creating the model. Check the details:')
    print(err._get_reason())

这给出了以下错误：

{'error': "Prediction failed: Expected tensor name: image_bytes, got tensor name: [u'image_bytes', u'key']."}

我删除了关键变量

body={'instances': {'image_bytes': {'b64': b64imagedata }}})

现在我收到以下错误：

{'error': 'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="NodeDef mentions attr \'dilations\' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>; NodeDef: conv/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[1,149,149,32]], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Mul, conv/conv2d_params). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).\n\t [[Node: conv/Conv2D = Conv2D[T=DT_FLOAT, _output_shapes=[[1,149,149,32]], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Mul, conv/conv2d_params)]]")'}

我不知道现在该怎么做，任何帮助都会受到赞赏

Edit1 ：在tensorflow 1.5上训练模型后，我重新部署了cloud-ml并运行了上面的脚本，现在我收到了这个错误：

{u'error': u'Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="contents must be scalar, got shape [1]\n\t [[Node: DecodeJpeg = DecodeJpeg[_output_shapes=[[?,?,3]], acceptable_fraction=1, channels=3, dct_method="", fancy_upscaling=true, ratio=1, try_recover_truncated=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_DecodeJpeg/contents_0_0)]]")'}

Edit2 ：经过这么长时间，感谢rhaertel80的努力，我已成功部署到ml引擎。这是最终的转换器脚本参考here由rhaertel80提供

    import tensorflow as tf
from tensorflow.contrib import layers

from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils
from tensorflow.python.saved_model import tag_constants
from tensorflow.python.saved_model import utils as saved_model_utils
import tensorflow.python.saved_model.simple_save


export_dir = 'my_model2'
retrained_graph = 'retrained_graph.pb'
label_count = 5

class Model(object):
    def __init__(self, label_count):
        self.label_count = label_count

    def build_prediction_graph(self, g):
        inputs = {
            'key': keys_placeholder,
            'image_bytes': tensors.input_jpeg
        }

        keys = tf.identity(keys_placeholder)
        outputs = {
            'key': keys,
            'prediction': g.get_tensor_by_name('final_result:0')
        }

        return inputs, outputs

    def export(self, output_dir):
        with tf.Session(graph=tf.Graph()) as sess:
            # This will be our input that accepts a batch of inputs
            image_bytes = tf.placeholder(tf.string, name='input', shape=(None,))
            # Force it to be a single input; will raise an error if we send a batch.
            coerced = tf.squeeze(image_bytes)
            # When we import the graph, we'll connect `coerced` to `DecodeJPGInput:0`
            input_map = {'DecodeJpeg/contents:0': coerced}

            with tf.gfile.GFile(retrained_graph, "rb") as f:
                graph_def = tf.GraphDef()
                graph_def.ParseFromString(f.read())
                tf.import_graph_def(graph_def, input_map=input_map, name="")

            keys_placeholder = tf.placeholder(tf.string, shape=[None])

            inputs = {'image_bytes': image_bytes, 'key': keys_placeholder}

            keys = tf.identity(keys_placeholder)
            outputs = {
                'key': keys,
                'prediction': tf.get_default_graph().get_tensor_by_name('final_result:0')}    

            tf.saved_model.simple_save(sess, output_dir, inputs, outputs)

model = Model(label_count)
model.export(export_dir)

与rhaertel80的代码的主要区别在于从DecodeJPGInput：0到DecodeJpeg / contents：0的变化，因为它提供了一个错误，说明前者的图中没有这样的引用

Answer 1

当您使用较新版本的TensorFlow进行训练时，往往会出现这些类型的错误，而不是在尝试提供模型时指定的错误类型。您提到您使用TF 1.5部署了模型，但是您没有提到用于训练模型/运行导出的TF版本。

我的建议是使用您用来训练模型的相同版本的TF。 CloudML Engine正式支持TF 1.6，并将在接下来的一周或两周内支持TF 1.7（现在甚至可以非正式地工作）。

或者，您可以降级用于训练模型的TF版本。

Answer 2

我最后一次看到该错误是张量流中的版本冲突。 Dialations是一个新概念，并且在API中从次要版本变为次要版本。我怀疑代码是为较旧版本的tensorflow编写的，你需要确保你的版本号与编写代码的次要版本号相同。

安装旧版本的最简单方法是创建一个新的conda环境，然后按照本页面上的回答（这就像第三个回答一样，eaiser跟随其他答案，所以寻找它）。

How to download previous version of tensorflow?

https://conda.io/docs/user-guide/tasks/manage-environments.html

在google-cloud-ml上部署和预测诗人的张量流

2 个答案: