Question

我的目标是要通过先训练权重的子集来依次训练网络，然后训练所有重量。考虑给出的两种架构 here 首先从“网络1”开始，“网络1”由一个输入标量z_1，两个分别具有权重（w_11，w_21）和偏置（b_1，b_2）的节点组成。 “网络2”通过添加输入节点（z_2）并因此还向每个节点（w_12，w_22）添加一个标量权重来扩展“网络1”。在“网络2”中，（w_11，w_21）和（b_1，b_2）由“网络1”的训练结果初始化，而（w_12，w_22）以其他方式初始化。

我知道如何保存和恢复权重子集（请参见here和here）。但是，在链接中描述的方法在使用tf.layers.dense(...)之类的完全连接的层时不起作用，它仅在还原由tf.Variable(...)实例化的变量的子集时才起作用。我可能需要为此编写一个自定义层，但是我不确定。 如何实现我的目标？

为提供背景信息，下面的脚本保存了“网络1”

import tensorflow as tf
import numpy as np
def generator(Z,reuse=False):
    with tf.variable_scope("restore"):
        h1 = tf.layers.dense(Z,2,activation=tf.nn.leaky_relu, name='h1')
    return h1

Z = tf.placeholder(tf.float32,[None,1])
G_sample = generator(Z)
Z_batch = np.random.uniform(-1., 1., size=[1, 1])
saver = tf.train.Saver(tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES,scope="restore")
init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    saver.save(sess, 'test')
    print('restore/h1/bias:0 :', sess.run(tf.get_default_graph().get_tensor_by_name("restore/h1/bias:0")))
    print('restore/h1/kernel:0 :', sess.run(tf.get_default_graph().get_tensor_by_name("restore/h1/kernel:0")))

这给出了输出

restore/h1/bias:0 : [0. 0.]
restore/h1/kernel:0 : [[-0.7695515  1.2254907]]

下面的脚本从上面的脚本还原图形，并用两个权重对其进行扩展。注意：当z_dim = 1时，代码运行良好（它只是恢复了与以前相同的图形），但是当z_dim = 2时，它显然失败了，因为它不知道要在“层”中恢复什么权重h1”。

import tensorflow as tf
import numpy as np
def generator(Z,reuse=False):
    with tf.variable_scope("restore"):
            h1 = tf.layers.dense(Z,2,activation=tf.nn.leaky_relu, name='h1')
    return h1
Z = tf.placeholder(tf.float32,[None,2])
G_sample = generator(Z)
z_dim = 2
Z_batch = np.random.uniform(-1., 1., size=[1, z_dim])

reader = tf.train.NewCheckpointReader('../test/modeltest')
restore_dict = dict()
for v in tf.trainable_variables():
    tensor_name = v.name.split(':')[0]
    if reader.has_tensor(tensor_name):
        print('has tensor ', tensor_name)
        restore_dict[tensor_name] = v

print('restore_dict:', restore_dict)
init_op = tf.global_variables_initializer()
saver = tf.train.Saver(restore_dict)

with tf.Session() as sess:
    sess.run(init_op)
    saver.restore(sess, 'test')
    print('restore/h1/bias:0 :',sess.run(tf.get_default_graph().get_tensor_by_name("restore/h1/bias:0")))
    print('restore/h1/kernel:0 :',sess.run(tf.get_default_graph().get_tensor_by_name("restore/h1/kernel:0")))

非常感谢您的投入。谢谢。

Answer 1

TensorFlow 2即将面世，它将tf.keras提升为官方的高级API。实际上，不赞成使用tf.layers，而是赞成tf.keras.layers。即使您仍在使用TensorFlow 1，您也应该使用tf.keras，因为它使一切变得如此简单，而且与流行的看法相反，它非常灵活（您可以自定义任何内容，甚至可以训练循环）。

这里是创建模型然后重用其第一层的示例。您既可以直接重用图层对象（但是模型实际上共享该图层，所以训练模型2将影响模型1，反之亦然），也可以创建一个新图层并复制其权重。

import tensorflow as tf
from tensorflow import keras
import numpy as np

X_train, X_test, X_new = np.random.randn(3, 100, 2)
y_train, y_test, y_new = np.random.rand(3, 100, 1)

# Build model 1
hidden1 = keras.layers.Dense(5, activation="relu", input_shape=[2])
output1 = keras.layers.Dense(1)
model1 = keras.models.Sequential([hidden1, output1])

# Train model 1
model1.compile(loss="mse", optimizer="sgd")
history = model1.fit(X_train, y_train, epochs=10)

# Evaluate and use model 1
score = model1.evaluate(X_test, y_test)
y_pred = model1.predict(X_new)

# Build model 2, sharing the first layer with model 1
hidden2 = hidden1
output2 = keras.layers.Dense(1)
model2 = keras.models.Sequential([hidden2, output2])

# Alternatively, create a new layer and copy its weights
hidden2 = keras.layers.Dense(5, activation="relu", input_shape=[2])
output2 = keras.layers.Dense(1)
model2 = keras.models.Sequential([hidden2, output2])
hidden2.set_weights(hidden1.get_weights())

如果您确实要坚持使用老式TensorFlow，则可以使用assign()操作将任何变量设置为任何值：

import tensorflow as tf

v1 = tf.Variable(1.0)
v2 = tf.Variable(2.0)
assign_op = v2.assign(v1)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    init.run()
    print("Before:")
    print("v1 =", sess.run(v1))
    print("v2 =", sess.run(v2))
    print()
    sess.run(assign_op)
    print("After:")
    print("v1 =", sess.run(v1))
    print("v2 =", sess.run(v2))

您需要遍历要复制的所有变量并为其创建分配操作，也许使用tf.group()将它们分组，然后运行此组操作。但为什么？ TensorFlow现在好多了，您应该使用新样式。

希望这会有所帮助，欢迎使用StackOverflow（SO）！

修改

如果要将权重的子集从模型1的一层复制到模型2的新层，则可以按以下步骤进行。在此示例中，我将仅复制第1层中5个神经元中前3个的权重和偏差。

在上面的代码中，而不是：

hidden2.set_weights(hidden1.get_weights())

使用此代码：

weights1, biases1 = hidden1.get_weights()
weights2, biases2 = hidden2.get_weights()
weights2[:, :3] = weights1[:, :3]
biases2[:3] = biases1[:3]
hidden2.set_weights([weights2, biases2])

如何还原完全连接层中的权重子集？

1 个答案: