我正在尝试将以前使用Keras(Tensorflow后端)编写的Autoencoder移植到纯Tensorflow实现中。我的最初尝试显示出与我的Keras模型完全不同的行为,因此我尝试将这两种实现简化下来,直到它们开始表现出相似的行为。但是,即使下面显示的最小示例也是有问题的(我无法共享数据集,但它由数千个二进制向量组成,即<Grid container className={classes.root}>
<Grid container justify="center">
<Paper className={classes.paper}>
</Paper>
</Grid>
</Grid>
是具有dtype inputs
的二维numpy数组,仅包含值{ {1}}和np.float32
):
0
差异不是随机权重初始化的结果,曲线看起来总是相似的。我特别不明白的是:为什么Keras模型收敛得这么快?为什么Tensorflow曲线中有抖动,但Keras曲线中没有抖动(Keras可以平滑损耗吗?可以将其禁用吗?)。为什么tensorflow模型训练速度快两倍以上? (此处未显示)。
我唯一不确定的是我对tensorflow模型进行批更新的实现是否正确。如果我在1
之前添加另一个内部for循环,如下所示:
inputs = digits_train
batch_size = 256
hidden_nodes = 120
learning_rate = 0.01
epochs = 100
# Tensorflow
dataset = tf.data.Dataset.from_tensor_slices(inputs)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat()
dataset = dataset.shuffle(10, inputs.shape[0])
dataset_it = dataset.make_one_shot_iterator()
input_layer = dataset_it.get_next()
layer_settings = {
'activation': tf.nn.sigmoid,
'kernel_initializer': tf.initializers.random_normal
}
layer = partial(tf.layers.dense, **layer_settings)
hidden_layer = layer(input_layer, hidden_nodes)
output_layer = layer(hidden_layer, inputs.shape[1])
loss = tf.reduce_mean(tf.square(output_layer - input_layer))
optimizer = tf.train.AdamOptimizer(learning_rate)
training_op = optimizer.minimize(loss)
losses = []
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for e in range(epochs):
_, loss_ = sess.run([training_op, loss])
losses.append(loss_)
plt.plot(losses, label="Tensorflow")
# Keras
hidden_layer = Dense(
hidden_nodes,
input_shape=(inputs.shape[1],),
activation='sigmoid',
kernel_initializer='RandomNormal',
)
output_layer = Dense(
inputs.shape[1],
activation='sigmoid',
kernel_initializer='RandomNormal',
)
model = Sequential([hidden_layer, output_layer])
adam = Adam(lr=learning_rate)
model.compile(optimizer=adam, loss='mean_squared_error')
losses = []
for e in range(epochs):
# could be done without the for loop but I want to do
# some other stuff here later on
h = model.fit(inputs,
inputs,
batch_size=batch_size,
initial_epoch=e,
epochs=(e + 1),
verbose=0)
losses.append(h.history['loss'])
plt.plot(losses, label="Keras")
plt.xlabel("Epoch")
plt.ylabel("MSE")
plt.legend()
学习曲线开始看起来更加相似(如下所示),但是我给人的印象是,每次运行都应遍历整个数据集,此外,Keras模型的性能始终更好,并且抖动仍然存在。