Estimator.predict()有形状问题?

时间:2018-04-19 01:56:28

标签: tensorflow tensorflow-estimator

我可以毫无问题地训练和评估Tensorflow Estimator模型。当我做预测时,会出现这个错误:

InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 68 should be: 2
     [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]

所有模型函数都使用相同的架构:

def _train_model_fn(features, labels, mode, params):
    features = _network_fn(features, mode, params)

    outputs = _get_output(features, params["output_layer"],
                          params["num_classes"])
    predictions = {
        "outputs": outputs
    }

    ... # loss initialization and whatnot

def _eval_model_fn(features, labels, mode, params):
    features = _network_fn(features, mode, params)
    outputs = _get_output(features, params["output_layer"], params["num_classes"])
    predictions = {
        "outputs": outputs
    }

    ... # loss initialization and whatnot


def _predict_model_fn(features, mode, params):
    features = _network_fn(features, mode, params)
    outputs = _get_output(features, params["output_layer"], params["num_classes"])
    predictions = {
        "outputs": outputs
    }

    ...

这是预测代码:

def predict(params, features, checkpoint_dir):
    estimator = tf.estimator.Estimator(model_fn=_predict_model_fn,
                                       params=params,
                                       model_dir=checkpoint_dir)
    predictions = estimator.predict(input_fn=_input_fn(features))
    for i, p in enumerate(predictions):
        print(i, p)

我还检查了每次输入在训练时传递图层时给出的形状,以及预测相同的内容。它们具有相同的形状:

训练:

conv2d [1, 358, 358, 16]
max_pool2d [1, 179, 179, 16]
collapse_to_rnn_dims [1, 179, 2864]
birnn [1, 179, 64]

预测:

conv2d [1, 358, 358, 16]
max_pool2d [1, 179, 179, 16]
collapse_to_rnn_dims [1, 179, 2864]
birnn [1, 179, 64]

以下是我传递给SparseTensor的{​​{1}}:

训练:

sparse_to_dense

评价为:

SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64))

预测:

SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64))

这几乎都是一样的。

出现这种情况的原因是什么?鉴于SparseTensor(indices=Tensor("CTCBeamSearchDecoder:0", shape=(?, 2), dtype=int64), values=Tensor("CTCBeamSearchDecoder:1", shape=(?,), dtype=int64), dense_shape=Tensor("CTCBeamSearchDecoder:2", shape=(2,), dtype=int64)) 是否遵循与其他_predict_model_fn s相同的架构,model_fn不应该工作吗?

这是完整的堆栈跟踪:

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_is_chief': True, '_service': None, '_save_summary_steps': 100, '_model_dir': 'checkpoint\\model-20180419-150303', '_task_id': 0, '_evaluation_master': '', '_tf_random_seed': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x00000091F58B3080>, '_num_ps_replicas': 0, '_master': '', '_save_checkpoints_secs': 600, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_global_id_in_cluster': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from checkpoint\model-20180419-150303\model.ckpt-1
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Process Process-2:
Traceback (most recent call last):
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
    return fn(*args)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
    target_list, status, run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: output_shape has incorrect number of elements: 68 should be: 2
     [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\train_ocr.py", line 42, in evaluate_model
    evaluate(architecture_params, images, labels, checkpoint_dir)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 82, in evaluate
    predict(params, features, checkpoint_dir)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 90, in predict
    for i, p in enumerate(predictions):
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 492, in predict
    preds_evaluated = mon_sess.run(predictions)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 546, in run
    run_metadata=run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1022, in run
    run_metadata=run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1113, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\six.py", line 693, in reraise
    raise value
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1098, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1170, in run
    run_metadata=run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py", line 950, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
    run_metadata_ptr)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
    options, run_metadata)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: output_shape has incorrect number of elements: 68 should be: 2
     [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]

Caused by op 'output', defined at:
  File "<string>", line 1, in <module>
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\spawn.py", line 119, in _main
    return self._bootstrap()
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 249, in _bootstrap
    self.run()
  File "C:\Users\asus.11\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\train_ocr.py", line 42, in evaluate_model
    evaluate(architecture_params, images, labels, checkpoint_dir)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 82, in evaluate
    predict(params, features, checkpoint_dir)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 90, in predict
    for i, p in enumerate(predictions):
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 479, in predict
    features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py", line 793, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 217, in _predict_model_fn
    outputs = _get_output(features, params["output_layer"], params["num_classes"])
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 134, in _get_output
    return _sparse_to_dense(decoded, name="output")
  File "C:\Users\asus.11\Documents\Optimized_OCR\trainer\backend\tf\experiment_ops.py", line 38, in _sparse_to_dense
    name=name)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\ops\sparse_ops.py", line 791, in sparse_to_dense
    name=name)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_sparse_ops.py", line 2401, in _sparse_to_dense
    name=name)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
    op_def=op_def)
  File "C:\Users\asus.11\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 68 should be: 2
     [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]

更新

我尝试在不同的训练运行中使用相同的架构,我遇到了不同的shap错误:

InvalidArgumentError (see above for traceback): output_shape has incorrect number of elements: 69 should be: 2
     [[Node: output = SparseToDense[T=DT_INT32, Tindices=DT_INT32, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ToInt32, ToInt32_1, ToInt32_2, bidirectional_rnn/bidirectional_rnn/fw/fw/time)]]

显然,问题似乎在于ctc_beam_search_decoder。切换到ctc_greedy_decoder也无济于事。为什么要这样做?

更多更新

我上传了可重复的示例:https://github.com/selcouthlyBlue/ShapeErrorReproduce

1 个答案:

答案 0 :(得分:0)

我终于找到了错误。问题实际上在于我使用sparse_to_dense的方式。显然,我给出的顺序是错误的,其中值首先出现在形状之前:

return tf.sparse_to_dense(tf.to_int32(decoded[0].indices),
                              tf.to_int32(decoded[0].values),
                              tf.to_int32(decoded[0].dense_shape),
                              name="output")

顺序应该是(形状在值之前出现):

return tf.sparse_to_dense(tf.to_int32(decoded[0].indices),
                              tf.to_int32(decoded[0].dense_shape),
                              tf.to_int32(decoded[0].values),
                              name="output")