Question

我有一个简单的代码（确实起作用），用于使用numpy数组作为特征和标签在Tensorflow中训练Keras模型。如果然后使用tf.data.Dataset.from_tensor_slices包装这些numpy数组，以便使用张量流数据集训练相同的Keras模型，则会收到错误消息。我还无法弄清原因（可能是张量流或keras错误，但我可能还缺少一些东西）。我使用的是python 3，tensorflow是1.10.0，numpy是1.14.5，不涉及GPU。

OBS1 ：在{<3}}“ 输入tf.data数据集”下显示了将tf.data.Dataset用作Keras输入的可能性。

OBS2 ：在下面的代码中，正在使用numpy数组执行“ #Train with numpy arrays”下的代码。如果注释该代码，而改用“ #Train with tf.data datasets”下的代码，则将重现该错误。

OBS3 ：在第13行中，该行已注释并以“ ###WORKAROUND 1###”开头，如果注释被删除并且该行用于tf.data.Dataset inputs，则错误会发生变化，即使我不完全明白为什么。

完整的代码是：

import tensorflow as tf
import numpy as np

np.random.seed(1)
tf.set_random_seed(1)

print(tf.__version__)
print(np.__version__)

#Import mnist dataset as numpy arrays
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()#Import
x_train, x_test = x_train / 255.0, x_test / 255.0 #normalizing
###WORKAROUND 1###y_train, y_test = (y_train.astype(dtype='float32'), y_test.astype(dtype='float32'))

x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1]*x_train.shape[2])) #reshaping 28 x 28 images to 1D vectors, similar to Flatten layer in Keras

batch_size = 32
#Create a tf.data.Dataset object equivalent to this data
tfdata_dataset_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
tfdata_dataset_train = tfdata_dataset_train.batch(batch_size).repeat()

#Creates model
keras_model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(512, activation=tf.nn.relu),
    tf.keras.layers.Dropout(0.2, seed=1),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

#Compile the model
keras_model.compile(optimizer='adam',
                    loss=tf.keras.losses.sparse_categorical_crossentropy,
                    metrics=['accuracy'])

#Train with numpy arrays
keras_training_history = keras_model.fit(x_train,
                y_train,
                initial_epoch=0,
                epochs=1,
                batch_size=batch_size
                )

#Train with tf.data datasets
#keras_training_history = keras_model.fit(tfdata_dataset_train,
#                initial_epoch=0,
#                epochs=1,
#                steps_per_epoch=60000//batch_size
#                )

print(keras_training_history.history)

使用tf.data.Dataset作为输入时观察到的错误是：

(...)
ValueError: Tensor conversion requested dtype uint8 for Tensor with dtype float32: 'Tensor("metrics/acc/Cast:0", shape=(?,), dtype=float32)'

During handling of the above exception, another exception occurred:

(...)
TypeError: Input 'y' of 'Equal' Op has type float32 that does not match type uint8 of argument 'x'.

如上面在 OBS3 中所述，从第13行删除注释时的错误是：

(...)
tensorflow.python.framework.errors_impl.InvalidArgumentError: In[0] is not a matrix
     [[Node: dense/MatMul = MatMul[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/dense/MatMul_grad/MatMul_1"], transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_sequential_input_0_0, dense/MatMul/ReadVariableOp)]]

任何帮助将不胜感激，包括可以重现错误的注释，因此，在这种情况下，我可以报告该错误。

Answer 1

我刚刚升级到Tensorflow 1.10以执行this code。我认为这是另一个Stackoverflow thread

中讨论的答案

该代码将执行，但是仅当我删除规范化时才执行，因为该行似乎占用了过多的CPU内存。我看到提示信息。我还减少了内核。

server.1=Server1:2888:3888
server.2=Server2:2888:3888
server.3=Server3:2888:3888

Answer 2

安装tf-nightly版本，并更改某些张量的dtypes（安装tf-nightly之后，错误会更改）解决了该问题，因此（希望）在1.11中可以解决。

相关资料：https://github.com/tensorflow/tensorflow/issues/21894

使用tf.data.Dataset作为Keras模型的训练输入无效

2 个答案: