是否有可能根据其中一个网络的丢失来训练两个独立的网络? (Tensorflow自定义优化)

时间:2018-06-18 05:56:18

标签: python tensorflow optimization loss

由于内存限制,我不得不将前馈道具中的两个网络(CNN和BLSTM)分开,并同时在两个网络上执行后备道。但是,似乎只有BLSTM权重得到更新,CNN权重保持不变!

我的实现在这里发布很长,但是我已经从GitHub修改了这个回归TF示例,我使用了与CNN + BLSTM模型相同的程序

import tensorflow as tf
import numpy

tf.reset_default_graph()
rng = numpy.random

# Parameters
learning_rate = 0.01
training_epochs = 25
display_step = 1

# Training Data
train_X = numpy.asarray([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
                         7.042,10.791,5.313,7.997,5.654,9.27,3.1])
train_Y = numpy.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
                         2.827,3.465,1.65,2.904,2.42,2.94,1.3])
n_samples = train_X.shape[0]

# tf Graph Input
X1 = tf.placeholder("float")
X2 = tf.placeholder("float")
Y = tf.placeholder("float")

# Set model weights
with tf.variable_scope('net1'):
    W1 = tf.Variable(rng.randn(), name="weight1")
    b1 = tf.Variable(rng.randn(), name="bias1")
    # Construct a linear model1
    pred1 = tf.add(tf.multiply(X1, W1), b1)

var1 = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='net1')

with tf.variable_scope('net2'):
    # Set model weights
    W2 = tf.Variable(rng.randn(), name="weight2")
    b2 = tf.Variable(rng.randn(), name="bias2")
    # Construct a linear model2
    pred2 = tf.add(tf.multiply(X2, W2), b2)

# Mean squared error
cost = tf.reduce_sum(tf.pow(pred2-Y, 2))/(2*n_samples)

# Gradient descent
var2 = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='net2')
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, var_list=var1+var2)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    # Fit all training data
    for epoch in range(training_epochs):
        for (x, y) in zip(train_X, train_Y):
            feat = sess.run(pred1, feed_dict={X1: x})
            sess.run(optimizer, feed_dict={X2: feat, Y: y})

        # Display logs per epoch step
        if (epoch+1) % display_step == 0:
            c = sess.run(cost, feed_dict={X2: train_X, Y:train_Y})
            print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c), \
                "W1=", sess.run(W1), "b1=", sess.run(b1), \
                "W2=", sess.run(W2), "b2=", sess.run(b2))

输出:

Epoch: 0001 cost= 1.897011757 W1= 2.462 b1= 2.34888 W2= 0.148642 b2= -0.452278
Epoch: 0002 cost= 1.914609313 W1= 2.462 b1= 2.34888 W2= 0.147074 b2= -0.451319
Epoch: 0003 cost= 1.913563490 W1= 2.462 b1= 2.34888 W2= 0.146999 b2= -0.450277
Epoch: 0004 cost= 1.912214637 W1= 2.462 b1= 2.34888 W2= 0.146948 b2= -0.449235
Epoch: 0005 cost= 1.910862923 W1= 2.462 b1= 2.34888 W2= 0.146898 b2= -0.448194
Epoch: 0006 cost= 1.909512877 W1= 2.462 b1= 2.34888 W2= 0.146847 b2= -0.447153
Epoch: 0007 cost= 1.908164740 W1= 2.462 b1= 2.34888 W2= 0.146797 b2= -0.446114
Epoch: 0008 cost= 1.906818390 W1= 2.462 b1= 2.34888 W2= 0.146747 b2= -0.445076
Epoch: 0009 cost= 1.905474305 W1= 2.462 b1= 2.34888 W2= 0.146696 b2= -0.444039
Epoch: 0010 cost= 1.904131770 W1= 2.462 b1= 2.34888 W2= 0.146646 b2= -0.443003
Epoch: 0011 cost= 1.902791142 W1= 2.462 b1= 2.34888 W2= 0.146596 b2= -0.441968
Epoch: 0012 cost= 1.901452661 W1= 2.462 b1= 2.34888 W2= 0.146546 b2= -0.440934
Epoch: 0013 cost= 1.900115728 W1= 2.462 b1= 2.34888 W2= 0.146496 b2= -0.439901
Epoch: 0014 cost= 1.898780823 W1= 2.462 b1= 2.34888 W2= 0.146446 b2= -0.438869
Epoch: 0015 cost= 1.897448182 W1= 2.462 b1= 2.34888 W2= 0.146396 b2= -0.437838
Epoch: 0016 cost= 1.896116853 W1= 2.462 b1= 2.34888 W2= 0.146346 b2= -0.436808
Epoch: 0017 cost= 1.894787788 W1= 2.462 b1= 2.34888 W2= 0.146296 b2= -0.43578
Epoch: 0018 cost= 1.893460274 W1= 2.462 b1= 2.34888 W2= 0.146246 b2= -0.434752
Epoch: 0019 cost= 1.892134905 W1= 2.462 b1= 2.34888 W2= 0.146196 b2= -0.433725
Epoch: 0020 cost= 1.890811205 W1= 2.462 b1= 2.34888 W2= 0.146147 b2= -0.4327
Epoch: 0021 cost= 1.889489293 W1= 2.462 b1= 2.34888 W2= 0.146097 b2= -0.431675
Epoch: 0022 cost= 1.888169646 W1= 2.462 b1= 2.34888 W2= 0.146047 b2= -0.430651
Epoch: 0023 cost= 1.886851430 W1= 2.462 b1= 2.34888 W2= 0.145998 b2= -0.429629
Epoch: 0024 cost= 1.885535717 W1= 2.462 b1= 2.34888 W2= 0.145948 b2= -0.428607
Epoch: 0025 cost= 1.884221435 W1= 2.462 b1= 2.34888 W2= 0.145899 b2= -0.427587

我尝试使用的技巧是将两个变量列表传递给优化程序tf.train.GradientDescentOptimizer(learning_rate).minimize(cost, var_list=var1+var2),但似乎只有w2b2正在更新,{ {1}}和w1保持不变。不知道这个任务是否可行,或者我的实现是否有问题?

更新:

b1net1之间的唯一间接联系正在训练循环中徘徊

net2

我从feat = sess.run(pred1, feed_dict={X1: x}) sess.run(optimizer, feed_dict={X2: feat, Y: y}) 获取该功能并将其用作net1的输入,这就是我需要的net2。我不能在图表中使用此连接,因为我使用两个网络的原始示例CNN + BLSTM使用不同的批量大小,因此它们必须在前向道具中分开。

1 个答案:

答案 0 :(得分:0)

第一个网络(pred1)的输出不会出现在成本计算中的任何位置。因此,计算它所涉及的变量的梯度是0,因为改变它们根本不会改变成本。您的第一个网络在此设置中完全没有用处。

如果您希望更新net1个变量,则需要在某处使用它们。例如,您可以使用pred1pred2的平均值作为网络输出。