直接在Tensorflow中的辍学层:如何训练?

时间:2019-04-01 14:03:32

标签: python tensorflow keras generative-adversarial-network dropout

在Keras中创建模型后,我想获取渐变并将其直接应用到tf.train.AdamOptimizer类的Tensorflow中。但是,由于我使用的是Dropout层,所以我不知道如何告诉模型是否处于训练模式。不接受 training 关键字。这是代码:

    net_input = Input(shape=(1,))
    net_1 = Dense(50)
    net_2 = ReLU()
    net_3 = Dropout(0.5)
    net = Model(net_input, net_3(net_2(net_1(net_input))))

    #mycost = ...

    optimizer = tf.train.AdamOptimizer()
    gradients = optimizer.compute_gradients(mycost, var_list=[net.trainable_weights])
    # perform some operations on the gradients
    # gradients = ...
    trainstep = optimizer.apply_gradients(gradients)

即使有辍学rate=1,无论是否有辍学层,我都会得到相同的行为。该如何解决?

2 个答案:

答案 0 :(得分:1)

Keras图层继承自tf.keras.layers.Layer类。 Keras API在内部使用model.fit处理此问题。如果Keras Dropout与纯TensorFlow训练循环一起使用,它在其调用函数中支持训练参数。

因此您可以使用

进行控制
dropout = tf.keras.layers.Dropout(rate, noise_shape, seed)(prev_layer, training=is_training)

来自TF官方文档

  

注意:-以下可选关键字参数保留用于   具体用途:*训练:Python布尔值的布尔标量张量   指示该呼叫是用于训练还是推论。 *   mask:布尔输入掩码。 -如果图层的调用方法带有掩码   参数(如某些Keras图层所做的那样),其默认值将设置为   上一层为输入生成的掩码(如果输入确实到来)   来自产生相应遮罩的图层,即它是否来自   具有遮罩支持的Keras层。   https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout#call

答案 1 :(得分:1)

@Sharky已经说过,您可以在调用training类的call()方法时使用Dropout参数。但是,如果要在张量流图模式下进行训练,则需要在训练过程中传递一个占位符并为其提供布尔值。这是适合于您的情况的适合高斯斑点的示例:

import tensorflow as tf
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import ReLU
from tensorflow.keras.layers import Input
from tensorflow.keras import Model

x_train, y_train = make_blobs(n_samples=10,
                              n_features=2,
                              centers=[[1, 1], [-1, -1]],
                              cluster_std=1)

x_train, x_test, y_train, y_test = train_test_split(
    x_train, y_train, test_size=0.2)

# `istrain` indicates whether it is inference or training
istrain = tf.placeholder(tf.bool, shape=()) 
y = tf.placeholder(tf.int32, shape=(None))
net_input = Input(shape=(2,))
net_1 = Dense(2)
net_2 = Dense(2)
net_3 = Dropout(0.5)
net = Model(net_input, net_3(net_2(net_1(net_input)), training=istrain))

xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
        labels=y, logits=net.output)
loss_fn = tf.reduce_mean(xentropy)

optimizer = tf.train.AdamOptimizer(0.01)
grads_and_vars = optimizer.compute_gradients(loss_fn,
                                             var_list=[net.trainable_variables])
trainstep = optimizer.apply_gradients(grads_and_vars)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    l1 = loss_fn.eval({net_input:x_train,
                       y:y_train,
                       istrain:True}) # apply dropout
    print(l1) # 1.6264652
    l2 = loss_fn.eval({net_input:x_train,
                       y:y_train,
                       istrain:False}) # no dropout
    print(l2) # 1.5676715
    sess.run(trainstep, feed_dict={net_input:x_train,
                                   y:y_train, 
                                   istrain:True}) # train with dropout