Question

我正在尝试使用tensorflow LSTM model来进行下一个单词预测。

如此related question（没有接受的答案）中所述，该示例包含伪代码以提取下一个单词概率：

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
  # The value of state is updated after processing each batch of words.
  output, state = lstm(current_batch_of_words, state)

  # The LSTM output can be used to make next word predictions
  logits = tf.matmul(output, softmax_w) + softmax_b
  probabilities = tf.nn.softmax(logits)
  loss += loss_function(probabilities, target_words)

我对如何解释概率向量感到困惑。我修改了ptb_word_lm.py中__init__的{{1}}函数来存储概率和logits：

PTBModel

然后在class PTBModel(object): """The PTB model.""" def __init__(self, is_training, config): # General definition of LSTM (unrolled) # identical to tensorflow example ... # omitted for brevity ... # computing the logits (also from example code) logits = tf.nn.xw_plus_b(output, tf.get_variable("softmax_w", [size, vocab_size]), tf.get_variable("softmax_b", [vocab_size])) loss = seq2seq.sequence_loss_by_example([logits], [tf.reshape(self._targets, [-1])], [tf.ones([batch_size * num_steps])], vocab_size) self._cost = cost = tf.reduce_sum(loss) / batch_size self._final_state = states[-1] # my addition: storing the probabilities and logits self.probabilities = tf.nn.softmax(logits) self.logits = logits # more model definition ...函数中打印了一些关于它们的信息：

run_epoch

这会产生如下输出：

def run_epoch(session, m, data, eval_op, verbose=True):
  """Runs the model on the given data."""
  # first part of function unchanged from example

  for step, (x, y) in enumerate(reader.ptb_iterator(data, m.batch_size,
                                                    m.num_steps)):
    # evaluate proobability and logit tensors too:
    cost, state, probs, logits, _ = session.run([m.cost, m.final_state, m.probabilities, m.logits, eval_op],
                                 {m.input_data: x,
                                  m.targets: y,
                                  m.initial_state: state})
    costs += cost
    iters += m.num_steps

    if verbose and step % (epoch_size // 10) == 10:
      print("%.3f perplexity: %.3f speed: %.0f wps, n_iters: %s" %
            (step * 1.0 / epoch_size, np.exp(costs / iters),
             iters * m.batch_size / (time.time() - start_time), iters))
      chosen_word = np.argmax(probs, 1)
      print("Probabilities shape: %s, Logits shape: %s" % 
            (probs.shape, logits.shape) )
      print(chosen_word)
      print("Batch size: %s, Num steps: %s" % (m.batch_size, m.num_steps))

  return np.exp(costs / iters)

我期待0.000 perplexity: 741.577 speed: 230 wps, n_iters: 220 (20, 10000) (20, 10000) [ 14 1 6 589 1 5 0 87 6 5 3 5 2 2 2 2 6 2 6 1] Batch size: 1, Num steps: 20向量是一个概率数组，对于词汇表中的每个单词都有一个（例如，形状为probs），这意味着我可以使用{{获得预测的单词1}}如另一个问题所示。

但是，向量的第一个维度实际上等于展开的LSTM中的步数（如果使用小配置设置，则为20个），我不知道该怎么做。要访问预测的单词，我是否只需要使用最后一个值（因为它是最后一步的输出）？或者还有其他我缺少的东西？

我试着通过查看必须执行此评估的seq2seq.sequence_loss_by_example的实现来了解预测是如何制定和评估的，但这最终会调用(1, vocab_size)，这似乎不是包含在github repo中，所以我不知道还能在哪里看。

我对tensorflow和LSTM都很陌生，所以任何帮助都表示赞赏！

Answer 1

output张量包含每个时间步长的LSTM单元输出的连接（参见其定义here）。因此，您可以通过chosen_word[-1]（或chosen_word[sequence_length - 1]（如果序列已填充以匹配展开的LSTM）来查找下一个单词的预测。

tf.nn.sparse_softmax_cross_entropy_with_logits() op在公共API中以不同的名称记录。由于技术原因，它调用生成的包装函数，该函数未出现在GitHub存储库中。 op的实现是在C ++中here。

Answer 2

我也在实施seq2seq模型。

所以让我试着用我的理解来解释：

LSTM模型的输出是2D张量大小[ batch_size ，大小的列表（长度 num_steps ） ]。

代码行：

output = tf.reshape(tf.concat(1, outputs), [-1, size])

将生成一个新的输出，这是一个尺寸为[ batch_size x num_steps ，尺寸]的2D张量。

对于您的情况，batch_size = 1和num_steps = 20 - ＆gt;输出形状为[ 20 ，大小]。

代码行：

logits = tf.nn.xw_plus_b(output, tf.get_variable("softmax_w", [size, vocab_size]), tf.get_variable("softmax_b", [vocab_size]))

＆LT; =＆GT; 输出 [batch_size x num_steps，size] x softmax_w [size，vocab_size]将输出大小为 log_size 的 logits x num_steps ， vocab_size ] 对于您的情况， logits 的大小[ 20 ， vocab_size ] - ＆GT; probs 张量与[ 20 ， vocab_size ]的 logits 大小相同。

代码行：

chosen_word = np.argmax(probs, 1)

将输出 selected_word 张量大小[ 20 ， 1 ]，每个值是当前字的下一个预测字索引。

代码行：

loss = seq2seq.sequence_loss_by_example([logits], [tf.reshape(self._targets, [-1])], [tf.ones([batch_size * num_steps])])

是计算序列的 batch_size 的softmax交叉熵损失。

使用LSTM ptb模型张量流示例预测下一个单词

2 个答案: