我正在尝试实施BiLSTM-Max,如下文所述: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
我正在使用Tensorflow进行实施。我从一个原始的LSTM代码开始,但已经成功地进行了修改,以便它可以运行动态长度输入和双向(即动态Bi-LSTM)。
# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the last output corresponding the length of input sequence
batch_size_ = tf.shape(outputs)[0]
index = tf.range(0, batch_size_) * seq_max_len + (seqlen - 1)
outputs = tf.gather(tf.reshape(outputs, [-1, 2*n_hidden]), index)
接下来将其修改为Bi-LSTM-Max,我替换了最后一个输出并在n_steps中找到最大值,如下所示:
# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the max output across n_steps
outputs = tf.reduce_max(outputs, reduction_indices=[1])
当我在n_steps维度上取最大值时,我假设那些索引> seqlen是0,所以我可以在整个维度上取最大值,而不是从0到seqlen取最大值。仔细观察后,我意识到由于随机初始化,非指定索引的值可能不为零,或者它可能只是内存中最后指定的值。
这个操作在python数组中是微不足道的,然而,对于Tensor操作,我找不到一个简单的方法。有没有人对此有所了解?
答案 0 :(得分:0)
可能最简单的方法是在找到最大值之前手动将无效输出设置为零或-∞。您可以使用tf.sequence_mask
和tf.where
轻松完成此操作:
seq_mask = tf.sequence_mask(seqlen, seq_max_len)
# You can also use e.g. -np.inf * tf.ones_like(outputs)
outputs_masked = tf.where(seq_mask, outputs, tf.zeros_like(outputs))
outputs = tf.reduce_max(outputs, axis=1) # axis is preferred to reduction_indices