翻译模型预测:TypeError:“ EagerTensor”类型的对象不可JSON序列化

时间:2019-10-03 13:24:16

标签: python tensorflow tensor2tensor

我按照建议的by Google's tensor2tensor repository遵循了翻译合作实验室笔记本教程

导出模型并将其上传到Google的AI平台引擎以进行在线预测后,我无法向模型提出请求。

我相信翻译模型的输入是源文本的张量。但我收到一个错误,TypeError: Object of type 'EagerTensor' is not JSON serializable


def encode(input_str, output_str=None):
  """Input str to features dict, ready for inference"""
  inputs = encoders["inputs"].encode(input_str) + [1]  # add EOS id
  batch_inputs = tf.reshape(inputs, [1, -1, 1])  # Make it 3D.
  return {"inputs": batch_inputs}

enfr_problem = problems.problem(PROBLEM)
encoders = enfr_problem.feature_encoders(DATA_DIR)

encoded_inputs = encode("Some text")
model_output = predict_json('project_name','model_name', encoded_inputs,'version_1')["outputs"]

我尝试过将张量转换为numpy,但仍然没有运气。有人可以指出我正确的方向吗?

2 个答案:

答案 0 :(得分:0)

问题在于您执行以下操作时TensorFlow返回EagerTensor:

inputs = encoders["inputs"].encode(input_str) + [1]  # add EOS id
batch_inputs = tf.reshape(inputs, [1, -1, 1])

并且EagerTensor无法转换为JSON。不幸的是,3D numpy数组也无法转换为JSON。但是numpy数组可以轻松转换为列表。一个例子:

import json
import numpy as np
import tensorflow as tf

a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
c = tf.multiply(a, b)

print(c)  # -> <tf.Tensor: shape=(3,), dtype=int64, numpy=array([1, 4, 9])>
print(c.numpy())  # -> array([1, 4, 9])
print(c.numpy().tolist())  # -> [1, 4, 9]

with open("example.json", "w") as f:
   json.dump(c, f)  # TypeError: Object of type EagerTensor is not JSON serializable
   json.dump(c.numpy(), f)  # TypeError: Object of type ndarray is not JSON serializable
   json.dump(c.numpy().tolist(), f)  # works!

由于您的代码段不够完整,因此我无法提供具体示例。但是

return {"inputs": batch_inputs.numpy().tolist()}

应该做好这项工作。

答案 1 :(得分:0)

如果您想将 dict 中的张量数据保存到 JSON 文件中,一个简单的解决方案是递归地进入您的字典并使用正确的函数将您的数据转换为 Json 中可序列化的内容(例如字符串,如果它只是为了保存字符串)。我确信 tensorflow 必须有一种方法将您的数据保存为泡菜文件,如果这是您真正想要做的(即保存您的数据)。

以下代码可用于将 dict 中的内容递归地转换为字符串,但您应该能够根据您的用例轻松修改和 numpify、jsonify 等代码。我的用例是以人类可读的格式保存数据(而不仅仅是 torch.save):

#%%

def _to_json_dict_with_strings(dictionary):
    """
    Convert dict to dict with leafs only being strings. So it recursively makes keys to strings
    if they are not dictionaries.

    Use case:
        - saving dictionary of tensors (convert the tensors to strins!)
        - saving arguments from script (e.g. argparse) for it to be pretty

    e.g.

    """
    if type(dictionary) != dict:
        return str(dictionary)
    d = {k: _to_json_dict_with_strings(v) for k, v in dictionary.items()}
    return d

def to_json(dic):
    import types
    import argparse

    if type(dic) is dict:
        dic = dict(dic)
    else:
        dic = dic.__dict__
    return _to_json_dict_with_strings(dic)

def save_to_json_pretty(dic, path, mode='w', indent=4, sort_keys=True):
    import json

    with open(path, mode) as f:
        json.dump(to_json(dic), f, indent=indent, sort_keys=sort_keys)

def my_pprint(dic):
    """

    @param dic:
    @return:

    Note: this is not the same as pprint.
    """
    import json

    # make all keys strings recursively with their naitve str function
    dic = to_json(dic)
    # pretty print
    pretty_dic = json.dumps(dic, indent=4, sort_keys=True)
    print(pretty_dic)
    # print(json.dumps(dic, indent=4, sort_keys=True))
    # return pretty_dic

import torch
# import json  # results in non serializabe errors for torch.Tensors
from pprint import pprint

dic = {'x': torch.randn(1, 3), 'rec': {'y': torch.randn(1, 3)}}

my_pprint(dic)
pprint(dic)

输出:

{
    "rec": {
        "y": "tensor([[-0.3137,  0.3138,  1.2894]])"
    },
    "x": "tensor([[-1.5909,  0.0516, -1.5445]])"
}
{'rec': {'y': tensor([[-0.3137,  0.3138,  1.2894]])},
 'x': tensor([[-1.5909,  0.0516, -1.5445]])}

相关链接:

https://discuss.pytorch.org/t/typeerror-tensor-is-not-json-serializable/36065/3How to prettyprint a JSON file?https://github.com/fossasia/visdom/issues/554