我很好奇是否有一种很好的方法来分享不同RNN细胞的权重,同时仍然为每个细胞提供不同的输入。
我想要构建的图表是这样的:
其中有三个橙色的LSTM单元并行运行,我想在其间共享权重。
我已经设法使用占位符实现类似于我想要的东西(请参阅下面的代码)。但是,使用占位符会破坏优化程序的渐变计算,并且不会训练任何超出我使用占位符的位置。是否有可能在Tensorflow中更好地实现这一目标?
我在Windows 7的Anaconda环境中使用Tensorflow 1.2和python 3.5。
代码:
def ann_model(cls,data, act=tf.nn.relu):
with tf.name_scope('ANN'):
with tf.name_scope('ann_weights'):
ann_weights = tf.Variable(tf.random_normal([1,
cls.n_ann_nodes]))
with tf.name_scope('ann_bias'):
ann_biases = tf.Variable(tf.random_normal([1]))
out = act(tf.matmul(data,ann_weights) + ann_biases)
return out
def rnn_lower_model(cls,data):
with tf.name_scope('RNN_Model'):
data_tens = tf.split(data, cls.sequence_length,1)
for i in range(len(data_tens)):
data_tens[i] = tf.reshape(data_tens[i],[cls.batch_size,
cls.n_rnn_inputs])
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(cls.n_rnn_nodes_lower)
outputs, states = tf.contrib.rnn.static_rnn(rnn_cell,
data_tens,
dtype=tf.float32)
with tf.name_scope('RNN_out_weights'):
out_weights = tf.Variable(
tf.random_normal([cls.n_rnn_nodes_lower,1]))
with tf.name_scope('RNN_out_biases'):
out_biases = tf.Variable(tf.random_normal([1]))
#Encode the output of the RNN into one estimate per entry in
#the input sequence
predict_list = []
for i in range(cls.sequence_length):
predict_list.append(tf.matmul(outputs[i],
out_weights)
+ out_biases)
return predict_list
def create_graph(cls,sess):
#Initializes the graph
with tf.name_scope('input'):
cls.x = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_inputs])
with tf.name_scope('labels'):
cls.y = tf.placeholder('float',[cls.batch_size,1])
with tf.name_scope('community_id'):
cls.c = tf.placeholder('float',[cls.batch_size,1])
#Define Placeholder to provide variable input into the
#RNNs with shared weights
cls.input_place = tf.placeholder('float',[cls.batch_size,
cls.sequence_length,
cls.n_rnn_inputs])
#global step used in optimizer
global_step = tf.Variable(0,trainable = False)
#Create ANN
ann_output = cls.ann_model(cls.c)
#Combine output of ANN with other input data x
ann_out_seq = tf.reshape(tf.concat([ann_output for _ in
range(cls.sequence_length)],1),
[cls.batch_size,
cls.sequence_length,
cls.n_ann_nodes])
cls.rnn_input = tf.concat([ann_out_seq,cls.x],2)
#Create 'unrolled' RNN by creating sequence_length many RNN Cells that
#share the same weights.
with tf.variable_scope('Lower_RNNs'):
#Create RNNs
daily_prediction, daily_prediction1 =[cls.rnn_lower_model(cls.input_place)]*2
培训迷你批次分两步计算:
RNNinput = sess.run(cls.rnn_input,feed_dict = {
cls.x:batch_x,
cls.y:batch_y,
cls.c:batch_c})
_ = sess.run(cls.optimizer, feed_dict={cls.input_place:RNNinput,
cls.y:batch_y,
cls.x:batch_x,
cls.c:batch_c})
感谢您的帮助。任何想法都将不胜感激。
答案 0 :(得分:2)
您有3个不同的输入:input_1, input_2, input_3
将其输入到共享参数的LSTM模型。然后连接3 lstm的输出并将其传递给最终的LSTM层。代码看起来像这样:
# Create input placeholder for the network
input_1 = tf.placeholder(...)
input_2 = tf.placeholder(...)
input_3 = tf.placeholder(...)
# create a shared rnn layer
def shared_rnn(...):
...
rnn_cell = tf.nn.rnn_cell.BasicLSTMCell(...)
# generate the outputs for each input
with tf.variable_scope('lower_lstm') as scope:
out_input_1 = shared_rnn(...)
scope.reuse_variables() # the variables will be reused.
out_input_2 = shared_rnn(...)
scope.reuse_variables()
out_input_3 = shared_rnn(...)
# verify whether the variables are reused
for v in tf.global_variables():
print(v.name)
# concat the three outputs
output = tf.concat...
# Pass it to the final_lstm layer and out the logits
logits = final_layer(output, ...)
train_op = ...
# train
sess.run(train_op, feed_dict{input_1: in1, input_2: in2, input_3:in3, labels: ...}
答案 1 :(得分:0)
我最后重新思考了我的架构,并提出了一个更可行的解决方案。
我没有复制LSTM细胞的中间层以创建具有相同权重的三个不同细胞,而是选择运行相同的细胞三次。每次运行的结果存储在像'tf.Variable这样的'缓冲区'中,然后整个变量用作最终LSTM层的输入。 I drew a diagram here
以这种方式实现它允许在3个时间步后有效输出,并且没有打破张量流反向传播算法(即ANN中的节点仍然可以训练。)
唯一棘手的事情是确保缓冲区符合最终RNN的正确顺序。