如何使用LSTM的先前输出和隐藏状态作为注意机制?

时间:2018-02-06 23:15:43

标签: tensorflow machine-learning lstm recurrent-neural-network attention-model

我目前正在尝试编写本文的注意机制:"Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015)。 (我用点分数全球关注)。

但是,我不确定如何从lstm解码输入隐藏和输出状态。问题是lstm解码器在时间t的输入取决于我需要使用t-1的输出和隐藏状态计算的数量。

以下是代码的相关部分:

with tf.variable_scope('data'):
    prob = tf.placeholder_with_default(1.0, shape=())
    X_or = tf.placeholder(shape = [batch_size, timesteps_1, num_input], dtype = tf.float32, name = "input")
    X = tf.unstack(X_or, timesteps_1, 1)
    y = tf.placeholder(shape = [window_size,1], dtype = tf.float32, name = "label_annotation")
    logits = tf.zeros((1,1), tf.float32)

with tf.variable_scope('lstm_cell_encoder'):
    rnn_layers = [tf.nn.rnn_cell.LSTMCell(size) for size in [hidden_size, hidden_size]]
    multi_rnn_cell = tf.nn.rnn_cell.MultiRNNCell(rnn_layers)
    lstm_outputs, lstm_state =  tf.contrib.rnn.static_rnn(cell=multi_rnn_cell,inputs=X,dtype=tf.float32)
    concat_lstm_outputs = tf.stack(tf.squeeze(lstm_outputs))
    last_encoder_state = lstm_state[-1]

with tf.variable_scope('lstm_cell_decoder'):

    initial_input = tf.unstack(tf.zeros(shape=(1,1,hidden_size2)))
    rnn_decoder_cell = tf.nn.rnn_cell.LSTMCell(hidden_size, state_is_tuple = True)
    # Compute the hidden and output of h_1

    for index in range(window_size):

        output_decoder, state_decoder = tf.nn.static_rnn(rnn_decoder_cell, initial_input, initial_state=last_encoder_state, dtype=tf.float32)

        # Compute the score for source output vector
        scores = tf.matmul(concat_lstm_outputs, tf.reshape(output_decoder[-1],(hidden_size,1)))
        attention_coef = tf.nn.softmax(scores)
        context_vector = tf.reduce_sum(tf.multiply(concat_lstm_outputs, tf.reshape(attention_coef, (window_size, 1))),0)
        context_vector = tf.reshape(context_vector, (1,hidden_size))

        # compute the tilda hidden state \tilde{h}_t=tanh(W[c_t, h_t]+b_t)
        concat_context = tf.concat([context_vector, output_decoder[-1]], axis = 1)
        W_tilde = tf.Variable(tf.random_normal(shape = [hidden_size*2, hidden_size2], stddev = 0.1), name = "weights_tilde", trainable = True)
        b_tilde = tf.Variable(tf.zeros([1, hidden_size2]), name="bias_tilde", trainable = True)
        hidden_tilde = tf.nn.tanh(tf.matmul(concat_context, W_tilde)+b_tilde) # hidden_tilde is [1*64]

        # update for next time step
        initial_input = tf.unstack(tf.reshape(hidden_tilde, (1,1,hidden_size2)))
        last_encoder_state = state_decoder

        # predict the target

        W_target = tf.Variable(tf.random_normal(shape = [hidden_size2, 1], stddev = 0.1), name = "weights_target", trainable = True)
        logit = tf.matmul(hidden_tilde, W_target)
        logits = tf.concat([logits, logit], axis = 0)

    logits = logits[1:]

循环中的部分是我不确定的。当我覆盖变量" initial_input"时,tensorflow会记住计算图吗?和" last_encoder_state"?

1 个答案:

答案 0 :(得分:1)

我认为如果您将tf.contrib.seq2seq.AttentionWrapper与其中一个实施结合使用,您的模型将会大大简化:LuongAttentioncell = LSTMCell(512) attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs) attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)

通过这种方式,可以在细胞水平上连接注意向量,这样在注意力应用后,细胞输出已经 seq2seq tutorial的示例:

window_size

请注意,这样您就不需要tf.nn.static_rnn循环,因为tf.nn.dynamic_rnnlast_encoder_state会实例化关注的细胞。

关于你的问题:你应该区分python变量和张量流图节点:你可以将/** * .app/components/helpers/Note.jsx */ // React import import React from 'react'; // React redux import {connect} from 'react-redux'; const mapStateToProps = state => { return { notes: state.notes }; }; class ConnectingNote extends React.Component{ render(){ const id = this.props.noteId; const note = this.props.notes.find( obj => obj.id == id); return( <div className="section container"> <h5 className="light">{note.title}</h5> <div className="divider"></div> <p className="flow-text"> {note.entry} </p> </div> ); } } const NoteItem = connect(mapStateToProps)(ConnectingNote); export default NoteItem; 分配给不同的张量,原始图节点不会因此而改变。这很灵活,但在结果网络中也可能会产生误导 - 您可能认为将LSTM连接到一个张量,但它实际上是另一个。一般来说,你不应该这样做。