我想创建一个自定义图层并应用自定义渐变。 完成后,我检查了是否正在通过optimizer.compute_gradients运行grad_func,但是自定义渐变似乎最终不会应用于学习结果。
到目前为止,检查自定义渐变是否成功的方法是比较运行optimizer.compute_gradients函数之前和之后的值。 有人知道如何申请和验证吗?
我花了很多时间试图解决这个问题,但是由于我的无知,我遇到了困难。
The full source can be seen here。
总体模型如下:
model = ResnetModel(RESNET_SIZE, num_classes=num_classes)
with tf.variable_scope(scope, 'ten', [inputs], reuse=reuse):
net = model(inputs, training=is_training)
batch_norm_params['is_training'] = True
net = slim.conv2d(net, DIMENSION, [1, 1],
weights_regularizer=slim.l2_regularizer(0.0001),
weights_initializer=slim.variance_scaling_initializer(),
activation_fn=tf.nn.relu,
scope='projection')
with tf.variable_scope('encoding'):
enc = encoding.encoding_layer(net, D=DIMENSION, K=NUM_CODEWORDS)
net = tf.reshape(enc, [-1, NUM_CODEWORDS*DIMENSION])
net = tf.math.l2_normalize(net, axis=1)
logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='logits')
自定义渐变应用的来源是: (编辑:已修改为使用tf.custom_gradient作为jdehesa的建议。)
def encoding_layer(inputs, D, K):
global batch_size
batch_size = inputs.get_shape().as_list()[0]
# init codewords and smoothing factor (learnable parameters)
std1 = 1. / ((K * D) ** (1 / 2))
codewords = slim.model_variable(name='codewords',
initializer=tf.random_uniform(shape=(K, D), minval=-std1, maxval=std1),
regularizer=slim.l2_regularizer(0.05))
scale = slim.model_variable(name='scale',
initializer=tf.random_uniform(shape=(K,), minval=-1, maxval=0),
regularizer=slim.l2_regularizer(0.05))
# BxHxWxD => Bx(HW)xD (BxNxD)
X = tf.reshape(inputs, [-1, inputs.shape[1] * inputs.shape[2], inputs.shape[3]], name='input')
return encoding(X, codewords, scale)
@tf.custom_gradient
def encoding(X, C, S):
'''
:param X:
:param C:
:param S:
:return E (N residual encoding vectors, B X K X D)
'''
# forward logic...
def grad(gradE):
# backward logic...
return GX, GC, GS # <- Is it correct to return values ?
return E, grad
optimizer.compute_gradients之后custom_gradient和grads_and_vars的结果梯度 ->似乎还没有应用。我需要手动申请还是有其他程序?
# Result gradients of custom_gradient
GC = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Mul_6:0", shape=(32, 128), dtype=float32)
GS = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Sum_7:0", shape=(32,), dtype=float32)
GX = {Tensor} Tensor("gradients/encoding/IdentityN_grad/Mul_5:0", shape=(16, 100, 128), dtype=float32)
# grads_and_vars after optimizer.compute_gradients
.
.
100 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/ten/resnet_model/conv2d_36/Conv2D_grad/tuple/control_dependency_1:0' shape=(3, 3, 512, 512) dtype=float32>, <tf.Variable 'ten/resnet_model/conv2d_36/kernel:0' shape=(3, 3, 512, 512) dtype=float32_ref>)
101 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_8:0' shape=(512,) dtype=float32>, <tf.Variable 'ten/resnet_model/batch_normalization_32/gamma:0' shape=(512,) dtype=float32_ref>)
102 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_9:0' shape=(512,) dtype=float32>, <tf.Variable 'ten/resnet_model/batch_normalization_32/beta:0' shape=(512,) dtype=float32_ref>)
103 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_6:0' shape=(1, 1, 512, 128) dtype=float32>, <tf.Variable 'projection/weights:0' shape=(1, 1, 512, 128) dtype=float32_ref>)
104 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_4:0' shape=(128,) dtype=float32>, <tf.Variable 'projection/BatchNorm/gamma:0' shape=(128,) dtype=float32_ref>)
105 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_5:0' shape=(128,) dtype=float32>, <tf.Variable 'projection/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>)
106 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_2:0' shape=(32, 128) dtype=float32>, <tf.Variable 'encoding/codewords:0' shape=(32, 128) dtype=float32_ref>)
107 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/AddN_1:0' shape=(32,) dtype=float32>, <tf.Variable 'encoding/scale:0' shape=(32,) dtype=float32_ref>)
108 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/logits/MatMul_grad/tuple/control_dependency_1:0' shape=(4096, 5) dtype=float32>, <tf.Variable 'logits/weights:0' shape=(4096, 5) dtype=float32_ref>)
109 = {tuple} <class 'tuple'>: (<tf.Tensor 'gradients/logits/BiasAdd_grad/tuple/control_dependency_1:0' shape=(5,) dtype=float32>, <tf.Variable 'logits/biases:0' shape=(5,) dtype=float32_ref>)
我正在等待那些能给我带来启发的人。
谢谢。