在Tensorflow 2.0上将tf.data.Dataset与Keras输入层一起使用

时间:2019-05-03 00:38:12

标签: python tensorflow keras tensorflow2.0

我正在试验TensorFlow 2.0 alpha,发现使用Numpy数组时它可以按预期工作,但是当使用tf.data.Dataset时,会出现输入尺寸错误。我将虹膜数据集用作最简单的示例来演示这一点:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder

import tensorflow as tf
from tensorflow.python import keras

iris = datasets.load_iris()

scl = StandardScaler()
ohe = OneHotEncoder(categories='auto')
data_norm = scl.fit_transform(iris.data)
data_target = ohe.fit_transform(iris.target.reshape(-1,1)).toarray()
train_data, val_data, train_target, val_target = train_test_split(data_norm, data_target, test_size=0.1)
train_data, test_data, train_target, test_target = train_test_split(train_data, train_target, test_size=0.2)


train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target))
train_dataset.batch(32)

test_dataset = tf.data.Dataset.from_tensor_slices((test_data, test_target))
test_dataset.batch(32)

val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target))
val_dataset.batch(32)

mdl = keras.Sequential([
    keras.layers.Dense(16, input_dim=4, activation='relu'),
    keras.layers.Dense(8, activation='relu'),
    keras.layers.Dense(8, activation='relu'),
    keras.layers.Dense(3, activation='sigmoid')]
)

mdl.compile(
    optimizer=keras.optimizers.Adam(0.01),
    loss=keras.losses.categorical_crossentropy,
    metrics=[keras.metrics.categorical_accuracy]
    )

history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset)

,我收到以下错误消息:

ValueError: Error when checking input: expected dense_16_input to have shape (4,) but got array with shape (1,)

假设数据集只有一维。如果我通过input_dim = 1,则会收到其他错误:

InvalidArgumentError: Incompatible shapes: [3] vs. [4]
     [[{{node metrics_5/categorical_accuracy/Equal}}]] [Op:__inference_keras_scratch_graph_8223]

tf.data.Dataset的{​​{1}}模型上使用Keras的正确方法是什么?

1 个答案:

答案 0 :(得分:2)

一些更改应该可以修复您的代码。 .end数据集转换不是就地发生的,因此您需要返回新的数据集。其次,您还应该添加一个batch()转换,以便在看到所有数据后数据集继续输出示例。

repeat()

您还需要在... train_dataset = tf.data.Dataset.from_tensor_slices((train_data, train_target)) train_dataset = train_dataset.batch(32) train_dataset = train_dataset.repeat() val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_target)) val_dataset = val_dataset.batch(32) val_dataset = val_dataset.repeat() ... 函数中为validation_steps添加参数:

model.fit()

对于您自己的数据,您可能需要调整验证数据集的history = mdl.fit(train_dataset, epochs=10, steps_per_epoch=15, validation_data=val_dataset, validation_steps=1) batch_size,以使验证数据在每个步骤中仅循环一次。