Question

我正在尝试实施BiLSTM-Max，如下文所述： Supervised Learning of Universal Sentence Representations from Natural Language Inference Data

我正在使用Tensorflow进行实施。我从一个原始的LSTM代码开始，但已经成功地进行了修改，以便它可以运行动态长度输入和双向（即动态Bi-LSTM）。

# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the last output corresponding the length of input sequence
batch_size_ = tf.shape(outputs)[0]
index = tf.range(0, batch_size_) * seq_max_len + (seqlen - 1)
outputs = tf.gather(tf.reshape(outputs, [-1, 2*n_hidden]), index)

接下来将其修改为Bi-LSTM-Max，我替换了最后一个输出并在n_steps中找到最大值，如下所示：

# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the max output across n_steps
outputs = tf.reduce_max(outputs, reduction_indices=[1])

当我在n_steps维度上取最大值时，我假设那些索引＆gt; seqlen是0，所以我可以在整个维度上取最大值，而不是从0到seqlen取最大值。仔细观察后，我意识到由于随机初始化，非指定索引的值可能不为零，或者它可能只是内存中最后指定的值。

这个操作在python数组中是微不足道的，然而，对于Tensor操作，我找不到一个简单的方法。有没有人对此有所了解？

Answer 1

可能最简单的方法是在找到最大值之前手动将无效输出设置为零或-∞。您可以使用tf.sequence_mask和tf.where轻松完成此操作：

seq_mask = tf.sequence_mask(seqlen, seq_max_len)
# You can also use e.g. -np.inf * tf.ones_like(outputs)
outputs_masked = tf.where(seq_mask, outputs, tf.zeros_like(outputs))
outputs = tf.reduce_max(outputs, axis=1)  # axis is preferred to reduction_indices

在具体指数的Tensor中找到最大值。（Bi-LSTM-max实施）

1 个答案:

在具体指数的Tensor中找到最大值。 （Bi-LSTM-max实施）

1 个答案:

在具体指数的Tensor中找到最大值。（Bi-LSTM-max实施）