Question

我在TensorFlow中使用数据集API作为输入管道（版本：r1.2）。我构建了我的数据集并用批量大小128对其进行了批处理。数据集被输入RNN。

不幸的是， dataset.output_shape 在第一维中返回维度（无），因此RNN引发错误：

Traceback (most recent call last):
  File "untitled1.py", line 188, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "untitled1.py", line 121, in main
    run_training()
  File "untitled1.py", line 57, in run_training
    is_training=True)
  File "/home/harold/huawei/ConvLSTM/ConvLSTM.py", line 216, in inference
    initial_state=initial_state)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 566, in dynamic_rnn
    dtype=dtype)
  File "/home/harold/anaconda2/envs/tensorflow_py2.7/lib/python2.7/site-packages/tensorflow/python/ops/rnn.py", line 636, in _dynamic_rnn_loop
    "Input size (depth of inputs) must be accessible via shape inference,"
ValueError: Input size (depth of inputs) must be accessible via shape inference, but saw value None.

我认为这个错误是由输入的形状引起的，第一个维度应该是批量大小而不是无。

这是代码：

origin_dataset = Dataset.BetweenS_Dataset(FLAGS.data_path)
train_dataset = origin_dataset.train_dataset
test_dataset = origin_dataset.test_dataset
shuffle_train_dataset = train_dataset.shuffle(buffer_size=10000)
shuffle_batch_train_dataset = shuffle_train_dataset.batch(128)
batch_test_dataset = test_dataset.batch(FLAGS.batch_size)

iterator = tf.contrib.data.Iterator.from_structure(
                           shuffle_batch_train_dataset.output_types,
                            shuffle_batch_train_dataset.output_shapes)
(images, labels) = iterator.get_next()

training_init_op = iterator.make_initializer(shuffle_batch_train_dataset)
test_init_op = iterator.make_initializer(batch_test_dataset)

print(shuffle_batch_train_dataset.output_shapes)

我打印output_shapes并给出：

(TensorShape([Dimension(None), Dimension(36), Dimension(100)]), TensorShape([Dimension(None)]))

我想它应该是128，因为我有批处理的数据集：

(TensorShape([Dimension(128), Dimension(36), Dimension(100)]), TensorShape([Dimension(128)]))

Answer 1

他们在实现中硬编码批量大小，它总是返回None（tf 1.3）。

def _padded_shape_to_batch_shape(s):
  return tensor_shape.vector(None).concatenate(
      tensor_util.constant_value_as_shape(s))

通过这种方式，他们可以批量处理所有元素（例如dataset_size=14，batch_size=5，last_batch_size=4）。

您可以使用dataset.filter和dataset.map来解决此问题

d = contrib.data.Dataset.from_tensor_slices([[5] for x in range(14)])
batch_size = 5
d = d.batch(batch_size)
d = d.filter(lambda e: tf.equal(tf.shape(e)[0], batch_size))
def batch_reshape(e):
    return  tf.reshape(e, [args.batch_size] + [s if s is not None else -1 for s in e.shape[1:].as_list()])
d = d.map(batch_reshape)
r = d.make_one_shot_iterator().get_next()
print('dataset_output_shape = %s' % r.shape)
with tf.Session() as sess:
    while True:
        print(sess.run(r))

<强>输出

dataset_output_shape =（5,1）

[[5] [5] [5] [5] [5]]

[[5] [5] [5] [5] [5]]

OutOfRangeError异常

Answer 2

此功能已添加drop_remainder参数，如下所示：

batch_test_dataset = test_dataset.batch(FLAGS.batch_size, drop_remainder=True)

从文档中

drop_remainder ：（可选。）tf.bool标量tf.Tensor，表示在最后一个批次的数量少于batch_size元素的情况下是否应删除该批次；默认行为是不删除较小的批次。

为什么dataset.output_shapes在批处理后返回demension（none）

2 个答案: