Question

在查看tensorflow的官方示例代码ptb_word_ln.py时，我对embedding_lookup有疑问。 the embedding_lookup node

我发现它仅用作输入。输出没有使用它。所以损失评估不能从这种嵌入中受益。那么在这里使用embedding_lookup有什么好处？如果我想在优化器中使用这个字嵌入，我是不是应该明确地将它与损失函数连接？

源代码如下：

self._input = input_

batch_size = input_.batch_size
num_steps = input_.num_steps
size = config.hidden_size
vocab_size = config.vocab_size

def lstm_cell():
  # With the latest TensorFlow source code (as of Mar 27, 2017),
  # the BasicLSTMCell will need a reuse parameter which is unfortunately not
  # defined in TensorFlow 1.0. To maintain backwards compatibility, we add
  # an argument check here:
  if 'reuse' in inspect.getargspec(
      tf.contrib.rnn.BasicLSTMCell.__init__).args:
    return tf.contrib.rnn.BasicLSTMCell(
        size, forget_bias=0.0, state_is_tuple=True,
        reuse=tf.get_variable_scope().reuse)
  else:
    return tf.contrib.rnn.BasicLSTMCell(
        size, forget_bias=0.0, state_is_tuple=True)
attn_cell = lstm_cell
if is_training and config.keep_prob < 1:
  def attn_cell():
    return tf.contrib.rnn.DropoutWrapper(
        lstm_cell(), output_keep_prob=config.keep_prob)
cell = tf.contrib.rnn.MultiRNNCell(
    [attn_cell() for _ in range(config.num_layers)], state_is_tuple=True)

self._initial_state = cell.zero_state(batch_size, data_type())

with tf.device("/cpu:0"):
  embedding = tf.get_variable(
      "embedding", [vocab_size, size], dtype=data_type())
  inputs = tf.nn.embedding_lookup(embedding, input_.input_data)#only use embeddings here

if is_training and config.keep_prob < 1:
  inputs = tf.nn.dropout(inputs, config.keep_prob)

outputs = []
state = self._initial_state
with tf.variable_scope("RNN"):
  for time_step in range(num_steps):
    if time_step > 0: tf.get_variable_scope().reuse_variables()
    (cell_output, state) = cell(inputs[:, time_step, :], state)
    outputs.append(cell_output)

output = tf.reshape(tf.stack(axis=1, values=outputs), [-1, size])
softmax_w = tf.get_variable(
    "softmax_w", [size, vocab_size], dtype=data_type())
softmax_b = tf.get_variable("softmax_b", [vocab_size], dtype=data_type())
logits = tf.matmul(output, softmax_w) + softmax_b
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example(
    [logits],
    [tf.reshape(input_.targets, [-1])],
    [tf.ones([batch_size * num_steps], dtype=data_type())])
self._cost = cost = tf.reduce_sum(loss) / batch_size
self._final_state = state

if not is_training:
  return

self._lr = tf.Variable(0.0, trainable=False)
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars),
                                  config.max_grad_norm)
optimizer = tf.train.GradientDescentOptimizer(self._lr)
self._train_op = optimizer.apply_gradients(
    zip(grads, tvars),
    global_step=tf.contrib.framework.get_or_create_global_step())

self._new_lr = tf.placeholder(
    tf.float32, shape=[], name="new_learning_rate")
self._lr_update = tf.assign(self._lr, self._new_lr)

Answer 1

实际输出使用嵌入查找。 TensorFlow程序通常构建为构建阶段，构建图形，以及使用会话在图形中执行操作的执行阶段。

在您的情况下，为了计算损失，您必须按以下顺序计算图表上的以下节点：

loss -> logits -> output -> outputs -> cell -> inputs -> embedding_lookup 另一种看待它的方法是，这些是嵌套函数调用： loss(logits(output(outputs(cell_output(cell(inputs(embedding_lookup(embedding)))))))) 我从每个函数（op）中发出了额外的参数，以使其更清晰。

为什么embedding_lookup仅用作编码器但ptb_word_ln.py中没有解码器

1 个答案: