我将实现以下权重归一化,并通过layers.dense()
将其合并到kernel_constraint
代码将为各层创建一个变量(根据各向同性分布),并且该变量将在每次训练迭代时更新。
单层更新的代码是
def spectral_norm(w, iteration=1):
w_shape = w.shape.as_list()
w = tf.reshape(w, [-1, w_shape[-1]])
u = tf.get_variable("u", [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)
u_hat = u
v_hat = None
for i in range(iteration):
"""
power iteration
Usually iteration = 1 will be enough
"""
v_ = tf.matmul(u_hat, tf.transpose(w))
v_hat = tf.nn.l2_normalize(v_)
u_ = tf.matmul(v_hat, w)
u_hat = tf.nn.l2_normalize(u_)
u_hat = tf.stop_gradient(u_hat)
v_hat = tf.stop_gradient(v_hat)
sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))
with tf.control_dependencies([u.assign(u_hat)]):
w_norm = w / sigma
w_norm = tf.reshape(w_norm, w_shape)
return w_norm
但是,当构建神经网络时
for units in hidden_layers:
x = layers.dense(
inputs=x,
units=units,
activation=activation,
kernel_constraint=spectral_norm,
*args,
**kwargs)
这导致许多问题;例如已经创建的变量。
想知道是否有将这两个结合在一起的正确方法吗?
答案 0 :(得分:0)
我在使用TF 1.13时遇到了这个问题。
我目前正在学习使用频谱归一化的方法,我不确定这是否是正确的答案,但是我认为问题在于您正在尝试重用可能具有不同含义的“ u”变量每一层都有形状。
我在“ spectral_norm”函数中添加了一个名称输入:
def spectral_norm(w, iteration=1, nombre='u'):
w_shape = w.shape.as_list()
w = tf.reshape(w, [-1, w_shape[-1]])
with tf.variable_scope(nombre, reuse=tf.AUTO_REUSE):
u = tf.get_variable(nombre, [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)
u_hat = u
v_hat = None
for i in range(iteration):
"""
power iteration
Usually iteration = 1 will be enough
"""
v_ = tf.matmul(u_hat, tf.transpose(w))
v_hat = tf.nn.l2_normalize(v_)
u_ = tf.matmul(v_hat, w)
u_hat = tf.nn.l2_normalize(u_)
u_hat = tf.stop_gradient(u_hat)
v_hat = tf.stop_gradient(v_hat)
sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))
with tf.control_dependencies([u.assign(u_hat)]):
w_norm = w / sigma
w_norm = tf.reshape(w_norm, w_shape)
return w_norm
,然后将内核约束声明为具有不同变量名称的lambda函数:
for idx_layer, units in enumerate(hidden_layers):
kern_const = lambda x: spectral_norm(x, iteration=1, nombre = 'layer_%d'%idx_layer+'_u')
x = layers.dense(inputs=x,
units=units,
activation=activation,
kernel_constraint=kern_const,
*args,
**kwargs)
此代码似乎可以正常运行,但感觉不自然...