Tensorflow;通过kernel_constraint将光谱归一化应用于layers.dense()

时间:2018-12-03 23:18:53

标签: tensorflow

我将实现以下权重归一化,并通过layers.dense()将其合并到kernel_constraint

enter image description here

代码将为各层创建一个变量(根据各向同性分布),并且该变量将在每次训练迭代时更新。

单层更新的代码是

def spectral_norm(w, iteration=1):
   w_shape = w.shape.as_list()
   w = tf.reshape(w, [-1, w_shape[-1]])

   u = tf.get_variable("u", [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)

   u_hat = u
   v_hat = None
   for i in range(iteration):
       """
       power iteration
       Usually iteration = 1 will be enough
       """
       v_ = tf.matmul(u_hat, tf.transpose(w))
       v_hat = tf.nn.l2_normalize(v_)

       u_ = tf.matmul(v_hat, w)
       u_hat = tf.nn.l2_normalize(u_)

   u_hat = tf.stop_gradient(u_hat)
   v_hat = tf.stop_gradient(v_hat)

   sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))

   with tf.control_dependencies([u.assign(u_hat)]):
       w_norm = w / sigma
       w_norm = tf.reshape(w_norm, w_shape)


   return w_norm

但是,当构建神经网络时

  for units in hidden_layers:
    x = layers.dense(
        inputs=x,
        units=units,
        activation=activation,
        kernel_constraint=spectral_norm,
        *args,
        **kwargs)

这导致许多问题;例如已经创建的变量。

想知道是否有将这两个结合在一起的正确方法吗?

1 个答案:

答案 0 :(得分:0)

我在使用TF 1.13时遇到了这个问题。

我目前正在学习使用频谱归一化的方法,我不确定这是否是正确的答案,但是我认为问题在于您正在尝试重用可能具有不同含义的“ u”变量每一层都有形状。

我在“ spectral_norm”函数中添加了一个名称输入:

def spectral_norm(w, iteration=1, nombre='u'):
    w_shape = w.shape.as_list()
    w = tf.reshape(w, [-1, w_shape[-1]])

    with tf.variable_scope(nombre, reuse=tf.AUTO_REUSE):
        u = tf.get_variable(nombre, [1, w_shape[-1]], initializer=tf.random_normal_initializer(), trainable=False)

    u_hat = u
    v_hat = None
    for i in range(iteration):

       """
        power iteration
        Usually iteration = 1 will be enough
        """

        v_ = tf.matmul(u_hat, tf.transpose(w))
        v_hat = tf.nn.l2_normalize(v_)

        u_ = tf.matmul(v_hat, w)
        u_hat = tf.nn.l2_normalize(u_)

    u_hat = tf.stop_gradient(u_hat)
    v_hat = tf.stop_gradient(v_hat)

    sigma = tf.matmul(tf.matmul(v_hat, w), tf.transpose(u_hat))

    with tf.control_dependencies([u.assign(u_hat)]):
        w_norm = w / sigma
        w_norm = tf.reshape(w_norm, w_shape)


    return w_norm

,然后将内核约束声明为具有不同变量名称的lambda函数:

for idx_layer, units in enumerate(hidden_layers):
    kern_const = lambda x: spectral_norm(x, iteration=1, nombre = 'layer_%d'%idx_layer+'_u')
    x = layers.dense(inputs=x,
        units=units,
        activation=activation,
        kernel_constraint=kern_const,
        *args,
        **kwargs)

此代码似乎可以正常运行,但感觉不自然...