Question

考虑10个单位的（密集）层。我现在想在这一层添加另一个（单个）单元。但是，我如何确保前10个重量没有经过训练，只有新的重量训练？

Answer 1

您可以将变量列表传递给不同的优化器。一个用于训练前10个单元，另一个用于训练另一个单元与tf.cond()一起训练，因此当你想要仅使用10或11个神经元时，你可以在图中有两个分支。您也可以使用tf.stop_gradient()，这实际上取决于您的使用案例。

如果没有更多的信息，很难回答这个问题，但在某些方面：

import tensorflow as tf

choose_branch=tf.placeholder(tf.bool)
w_old=tf.Variable(np.random.normal(size=(1,10)).astype("float32"))
b=tf.Variable(np.zeros((1)).astype("float32"))
x=tf.placeholder(tf.float32,size=[None,1])
target=tf.placeholder(tf.float32,size=[None,1])

hidden_old=tf.nn.relu(tf.matmul(x,w_old)+b)

w_proj_old=tf.Variable(np.random.normal(size=(10,1)).astype("float32")

y_old=tf.matmul(hidden_old, w_proj_old)

cost_old=tf.reduce_mean(tf.square(y_old-target))

w_plus=tf.Variable(np.random.normal(size=(1,1)).astype("float32"))
w_proj_plus=tf.Variable(np.random.normal(size=(1,1)).astype("float32")

w_proj_new=tf.concat([w_proj_old,w_proj_plus],axis=1)

w_new=tf.concat([w_old,w_plus],axis=1)
hidden_new=tf.nn.relu(tf.matmul(x,w_new,axis=1))+b))

y_new=tf.matmul(hidden_new, w_proj_new)

cost_new=tf.reduce_mean(tf.square(y_new-target))

opt_old=tf.train.GradientDescentOptimizer(0.001)
opt_new=tf.train.GradientDescentOptimizer(0.0001)

train_step_old=opt_old.minimize(var_list=[w_old,b,w_proj_old,b])
train_step_new=opt_new.minimize(var_list=[w_plus,w_proj_plus])
y=tf.cond(choose_branch,lambda: y_old,lambda: y_new)

小心的代码没有经过测试。

Answer 2

要做这种高级操作，我发现切换到张量流的较低API更容易。

完全连接的层通常被定义为矩阵乘法。例如，假设您的上一个图层具有128个功能，并且您希望实现具有256功能的完全连接的图层。你可以写

batch_size = 128
num_in = 128
num_out = 256

h = tf.zeros((batch_size, num_in)) # actually, the output of the previous layer

# create the weights for the fully connected layer
w = tf.get_variable('w', shape=(num_in, num_out))
# compute the output
y = tf.matmul(h, w)
# -- feel free to add biases and activation

现在让我们假设你已经训练了w并希望在这一层添加一些额外的神经元。您可以创建一个额外的变量来保存额外的权重，并将其与现有的变量连接起来。

num_out_extra = 10
# now w is set to trainable=False, we don't want its values to change
w = tf.get_variable('w', shape=(num_in, num_out), trainable=False)
# our new weights
w_extra = tf.get_variable('w_extra', shape=(num_in, num_out_extra))
w_total = tf.concat([w, w_extra], axis=-1)
y = tf.matmul(h, w_total)
# now y has 266 features

当然，您需要以某种方式初始化所有权重。

我如何只训练新的重量？

2 个答案: