由于内存限制,我不得不将前馈道具中的两个网络(CNN和BLSTM)分开,并同时在两个网络上执行后备道。但是,似乎只有BLSTM权重得到更新,CNN权重保持不变!
我的实现在这里发布很长,但是我已经从GitHub修改了这个回归TF示例,我使用了与CNN + BLSTM模型相同的程序
import tensorflow as tf
import numpy
tf.reset_default_graph()
rng = numpy.random
# Parameters
learning_rate = 0.01
training_epochs = 25
display_step = 1
# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]
# tf Graph Input
X1 = tf.placeholder("float")
X2 = tf.placeholder("float")
Y = tf.placeholder("float")
# Set model weights
with tf.variable_scope('net1'):
W1 = tf.Variable(rng.randn(), name="weight1")
b1 = tf.Variable(rng.randn(), name="bias1")
# Construct a linear model1
pred1 = tf.add(tf.multiply(X1, W1), b1)
var1 = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='net1')
with tf.variable_scope('net2'):
# Set model weights
W2 = tf.Variable(rng.randn(), name="weight2")
b2 = tf.Variable(rng.randn(), name="bias2")
# Construct a linear model2
pred2 = tf.add(tf.multiply(X2, W2), b2)
# Mean squared error
cost = tf.reduce_sum(tf.pow(pred2-Y, 2))/(2*n_samples)
# Gradient descent
var2 = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='net2')
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, var_list=var1+var2)
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
# Fit all training data
for epoch in range(training_epochs):
for (x, y) in zip(train_X, train_Y):
feat = sess.run(pred1, feed_dict={X1: x})
sess.run(optimizer, feed_dict={X2: feat, Y: y})
# Display logs per epoch step
if (epoch+1) % display_step == 0:
c = sess.run(cost, feed_dict={X2: train_X, Y:train_Y})
print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
"W1=", sess.run(W1), "b1=", sess.run(b1), \
"W2=", sess.run(W2), "b2=", sess.run(b2))
输出:
Epoch: 0001 cost= 1.897011757 W1= 2.462 b1= 2.34888 W2= 0.148642 b2= -0.452278
Epoch: 0002 cost= 1.914609313 W1= 2.462 b1= 2.34888 W2= 0.147074 b2= -0.451319
Epoch: 0003 cost= 1.913563490 W1= 2.462 b1= 2.34888 W2= 0.146999 b2= -0.450277
Epoch: 0004 cost= 1.912214637 W1= 2.462 b1= 2.34888 W2= 0.146948 b2= -0.449235
Epoch: 0005 cost= 1.910862923 W1= 2.462 b1= 2.34888 W2= 0.146898 b2= -0.448194
Epoch: 0006 cost= 1.909512877 W1= 2.462 b1= 2.34888 W2= 0.146847 b2= -0.447153
Epoch: 0007 cost= 1.908164740 W1= 2.462 b1= 2.34888 W2= 0.146797 b2= -0.446114
Epoch: 0008 cost= 1.906818390 W1= 2.462 b1= 2.34888 W2= 0.146747 b2= -0.445076
Epoch: 0009 cost= 1.905474305 W1= 2.462 b1= 2.34888 W2= 0.146696 b2= -0.444039
Epoch: 0010 cost= 1.904131770 W1= 2.462 b1= 2.34888 W2= 0.146646 b2= -0.443003
Epoch: 0011 cost= 1.902791142 W1= 2.462 b1= 2.34888 W2= 0.146596 b2= -0.441968
Epoch: 0012 cost= 1.901452661 W1= 2.462 b1= 2.34888 W2= 0.146546 b2= -0.440934
Epoch: 0013 cost= 1.900115728 W1= 2.462 b1= 2.34888 W2= 0.146496 b2= -0.439901
Epoch: 0014 cost= 1.898780823 W1= 2.462 b1= 2.34888 W2= 0.146446 b2= -0.438869
Epoch: 0015 cost= 1.897448182 W1= 2.462 b1= 2.34888 W2= 0.146396 b2= -0.437838
Epoch: 0016 cost= 1.896116853 W1= 2.462 b1= 2.34888 W2= 0.146346 b2= -0.436808
Epoch: 0017 cost= 1.894787788 W1= 2.462 b1= 2.34888 W2= 0.146296 b2= -0.43578
Epoch: 0018 cost= 1.893460274 W1= 2.462 b1= 2.34888 W2= 0.146246 b2= -0.434752
Epoch: 0019 cost= 1.892134905 W1= 2.462 b1= 2.34888 W2= 0.146196 b2= -0.433725
Epoch: 0020 cost= 1.890811205 W1= 2.462 b1= 2.34888 W2= 0.146147 b2= -0.4327
Epoch: 0021 cost= 1.889489293 W1= 2.462 b1= 2.34888 W2= 0.146097 b2= -0.431675
Epoch: 0022 cost= 1.888169646 W1= 2.462 b1= 2.34888 W2= 0.146047 b2= -0.430651
Epoch: 0023 cost= 1.886851430 W1= 2.462 b1= 2.34888 W2= 0.145998 b2= -0.429629
Epoch: 0024 cost= 1.885535717 W1= 2.462 b1= 2.34888 W2= 0.145948 b2= -0.428607
Epoch: 0025 cost= 1.884221435 W1= 2.462 b1= 2.34888 W2= 0.145899 b2= -0.427587
我尝试使用的技巧是将两个变量列表传递给优化程序tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, var_list=var1+var2)
,但似乎只有w2
和b2
正在更新,{ {1}}和w1
保持不变。不知道这个任务是否可行,或者我的实现是否有问题?
更新:
b1
和net1
之间的唯一间接联系正在训练循环中徘徊
net2
我从feat = sess.run(pred1, feed_dict={X1: x})
sess.run(optimizer, feed_dict={X2: feat, Y: y})
获取该功能并将其用作net1
的输入,这就是我需要的net2
。我不能在图表中使用此连接,因为我使用两个网络的原始示例CNN + BLSTM使用不同的批量大小,因此它们必须在前向道具中分开。
答案 0 :(得分:0)
第一个网络(pred1
)的输出不会出现在成本计算中的任何位置。因此,计算它所涉及的变量的梯度是0,因为改变它们根本不会改变成本。您的第一个网络在此设置中完全没有用处。
如果您希望更新net1
个变量,则需要在某处使用它们。例如,您可以使用pred1
和pred2
的平均值作为网络输出。