如何应用自定义渐变?

时间:2019-05-13 18:08:51

标签: python tensorflow deep-learning gradient

我想创建一个自定义图层并应用自定义渐变。 完成后,我检查了是否正在通过optimizer.compute_gradients运行grad_func,但是自定义渐变似乎最终不会应用于学习结果。

到目前为止,检查自定义渐变是否成功的方法是比较运行optimizer.compute_gradients函数之前和之后的值。 有人知道如何申请和验证吗?

我花了很多时间试图解决这个问题,但是由于我的无知,我遇到了困难。

The full source can be seen here

总体模型如下:

model = ResnetModel(RESNET_SIZE, num_classes=num_classes)

with tf.variable_scope(scope, 'ten', [inputs], reuse=reuse):
    net = model(inputs, training=is_training)

batch_norm_params['is_training'] = True
net = slim.conv2d(net, DIMENSION, [1, 1],
                      weights_regularizer=slim.l2_regularizer(0.0001),  
                      weights_initializer=slim.variance_scaling_initializer(),
                      activation_fn=tf.nn.relu,
                      scope='projection')
with tf.variable_scope('encoding'):
    enc = encoding.encoding_layer(net, D=DIMENSION, K=NUM_CODEWORDS)
net = tf.reshape(enc, [-1, NUM_CODEWORDS*DIMENSION])
net = tf.math.l2_normalize(net, axis=1)
logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='logits')

自定义渐变应用的来源是: (编辑:已修改为使用tf.custom_gradient作为jdehesa的建议。)

def encoding_layer(inputs, D, K):

    global batch_size
    batch_size = inputs.get_shape().as_list()[0]

    # init codewords and smoothing factor (learnable parameters)
    std1 = 1. / ((K * D) ** (1 / 2))
    codewords = slim.model_variable(name='codewords',
                                    initializer=tf.random_uniform(shape=(K, D), minval=-std1, maxval=std1),
                                    regularizer=slim.l2_regularizer(0.05))
    scale = slim.model_variable(name='scale',
                                initializer=tf.random_uniform(shape=(K,), minval=-1, maxval=0),
                                regularizer=slim.l2_regularizer(0.05))


    # BxHxWxD => Bx(HW)xD (BxNxD)
    X = tf.reshape(inputs, [-1, inputs.shape[1] * inputs.shape[2], inputs.shape[3]], name='input')

    return encoding(X, codewords, scale)


@tf.custom_gradient
def encoding(X, C, S):
    '''
    :param X:
    :param C:
    :param S:
    :return E (N residual encoding vectors, B X K X D)
    '''

    # forward logic...

    def grad(gradE):

        # backward logic...

        return GX, GC, GS   # <- Is it correct to return values ?

    return E, grad

optimizer.compute_gradients之后custom_gradient和grads_and_vars的结果梯度 ->似乎还没有应用。我需要手动申请还是有其他程序?

# Result gradients of custom_gradient
GC = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Mul_6:0", shape=(32, 128), dtype=float32)
GS = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Sum_7:0", shape=(32,), dtype=float32)
GX = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Mul_5:0", shape=(16, 100, 128), dtype=float32)

# grads_and_vars after optimizer.compute_gradients
.
.
100 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/ten/resnet_model/conv2d_36/Conv2D_grad/tuple/control_dependency_1:0' shape=(3, 3, 512, 512) dtype=float32>, <tf.Variable 'ten/resnet_model/conv2d_36/kernel:0' shape=(3, 3, 512, 512) dtype=float32_ref>)
101 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_8:0' shape=(512,) dtype=float32>, <tf.Variable 'ten/resnet_model/batch_normalization_32/gamma:0' shape=(512,) dtype=float32_ref>)
102 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_9:0' shape=(512,) dtype=float32>, <tf.Variable 'ten/resnet_model/batch_normalization_32/beta:0' shape=(512,) dtype=float32_ref>)
103 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_6:0' shape=(1, 1, 512, 128) dtype=float32>, <tf.Variable 'projection/weights:0' shape=(1, 1, 512, 128) dtype=float32_ref>)
104 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_4:0' shape=(128,) dtype=float32>, <tf.Variable 'projection/BatchNorm/gamma:0' shape=(128,) dtype=float32_ref>)
105 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_5:0' shape=(128,) dtype=float32>, <tf.Variable 'projection/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>)
106 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_2:0' shape=(32, 128) dtype=float32>, <tf.Variable 'encoding/codewords:0' shape=(32, 128) dtype=float32_ref>)
107 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_1:0' shape=(32,) dtype=float32>, <tf.Variable 'encoding/scale:0' shape=(32,) dtype=float32_ref>)
108 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/logits/MatMul_grad/tuple/control_dependency_1:0' shape=(4096, 5) dtype=float32>, <tf.Variable 'logits/weights:0' shape=(4096, 5) dtype=float32_ref>)
109 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/logits/BiasAdd_grad/tuple/control_dependency_1:0' shape=(5,) dtype=float32>, <tf.Variable 'logits/biases:0' shape=(5,) dtype=float32_ref>)

我正在等待那些能给我带来启发的人。

谢谢。

0 个答案:

没有答案