输入tf.data数据集

Question

我是tensorflow keras和数据集的新手。谁能帮助我了解为什么以下代码不起作用？

import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
from tensorflow.python.data.ops import dataset_ops
from tensorflow.python.data.ops import iterator_ops
from tensorflow.python.keras.utils import multi_gpu_model
from tensorflow.python.keras import backend as K


data = np.random.random((1000,32))
labels = np.random.random((1000,10))
dataset = tf.data.Dataset.from_tensor_slices((data,labels))
print( dataset)
print( dataset.output_types)
print( dataset.output_shapes)
dataset.batch(10)
dataset.repeat(100)

inputs = keras.Input(shape=(32,))  # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(64, activation='relu')(x)
predictions = keras.layers.Dense(10, activation='softmax')(x)

# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
          loss='categorical_crossentropy',
          metrics=['accuracy'])

# Trains for 5 epochs
model.fit(dataset, epochs=5, steps_per_epoch=100)

它失败，并出现以下错误：

model.fit(x=dataset, y=None, epochs=5, steps_per_epoch=100)
File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 1510, in fit
validation_split=validation_split)
File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 994, in _standardize_user_data
class_weight, batch_size)
File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 1113, in _standardize_weights
exception_prefix='input')
File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training_utils.py", line 325, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (32,)

根据tf.keras指南，我应该能够将数据集直接传递给model.fit，如以下示例所示：

输入tf.data数据集

使用Datasets API扩展到大型数据集或多设备培训。将tf.data.Dataset实例传递给fit方法：

# Instantiates a toy dataset instance:
dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32)
dataset = dataset.repeat()

在数据集上调用steps_per_epoch时，请不要忘记指定fit。

model.fit（数据集，epochs = 10，steps_per_epoch = 30）   这里，fit方法使用steps_per_epoch参数-这是模型在移至下一个纪元之前运行的训练步数。由于数据集会产生一批数据，因此此代码段不需要batch_size。

数据集也可以用于验证：

dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels))
val_dataset = val_dataset.batch(32).repeat()

model.fit(dataset, epochs=10, steps_per_epoch=30,
      validation_data=val_dataset,
      validation_steps=3)

我的代码有什么问题，正确的方法是什么？

Answer 1

您缺少定义迭代器的原因，这就是发生错误的原因。

data = np.random.random((1000,32))
labels = np.random.random((1000,10))
dataset = tf.data.Dataset.from_tensor_slices((data,labels))
dataset = dataset.batch(10).repeat()
inputs = Input(shape=(32,))  # Returns a placeholder tensor

# A layer instance is callable on a tensor, and returns a tensor.
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)

# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)

# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
          loss='categorical_crossentropy',
          metrics=['accuracy'])

# Trains for 5 epochs
model.fit(dataset.make_one_shot_iterator(), epochs=5, steps_per_epoch=100)

第1/5集 100/100 [=============================]-1s 8ms / step-损耗：11.5787-acc：0.1010

第2/5集 100/100 [=============================]-0s 4ms / step-损耗：11.4846-acc：0.0990

第3/5集 100/100 [=============================]-0s 4ms / step-损耗：11.4690-acc：0.1270

第4/5集 100/100 [=============================]-0s 4ms / step-损耗：11.4611-acc：0.1300

第5/5集 100/100 [=============================]-0s 4ms / step-损耗：11.4546-acc：0.1360

这是我系统上的结果。

Answer 2

关于您为什么会收到错误的原始问题：

Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (32,)

代码中断的原因是因为您没有将.batch()返回到dataset变量，就像这样：

dataset = dataset.batch(10)

您只是打了dataset.batch()。

之所以中断，是因为没有batch()时，输出张量不会被批处理，即您得到的形状是(32,)而不是(1,32)。

带TF数据集输入的Tensorflow keras

输入tf.data数据集

在数据集上调用`steps_per_epoch`时，请不要忘记指定`fit`。

2 个答案:

带TF数据集输入的Tensorflow keras

输入tf.data数据集

在数据集上调用steps_per_epoch时，请不要忘记指定fit。

2 个答案:

在数据集上调用`steps_per_epoch`时，请不要忘记指定`fit`。