Question

我试图理解这个Tensorflow代码，它是word2vec skip-gram模型实现的一部分。

具体来说，我试图了解sampled_softmax_loss如何知道softmax_weights矩阵使用哪种嵌入。

with graph.as_default(), tf.device('/cpu:0'):

  # Input data.
  train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)

  # Variables.
  embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
  softmax_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
  softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

  # Model.
  # Look up embeddings for inputs.
  embed = tf.nn.embedding_lookup(embeddings, train_dataset)
  # Compute the softmax loss, using a sample of the negative labels each time.
  loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed, labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

完整代码在此处：https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

softmax_weights是一个矩阵，其中每一行代表特定单词（类）的嵌入。

sampled_softmax_loss是负采样的Tensorflow实现。

用于输入'标签'的train_labels是一个数字数组，可以用作获取与数字对应的单词的键，也可以用作从''获取特定嵌入的键。上面的嵌入代码，如“embed = tf.nn.embedding_lookup（embeddings，train_dataset）”中所示

我想知道samples_softmax_loss是否也使用train_labels标签让每个数字对应softmax_weights中的特定嵌入，以及softmax_biases的特定偏差？然后它使用softmax_weights中的随机嵌入作为负样本？

这可能会被标记为Tensorflow negative sampling的可能副本它基本上会提出相同的问题，但没有一个答案专门回答它是如何从softmax_weights和softmax_biases中提取特定嵌入的

Answer 1

嵌入查找仅适用于输入 - embed = tf.nn.embedding_lookup(embeddings, train_dataset)。函数tf.nn.sampled_softmax_loss()然后获取形状张量[batch_size, embedding_size]并计算目标标签的采样softmax和num_sampled随机标签。

＆＃34;输出＆＃34;没有必要。一边要注意嵌入。它唯一关心的嵌入是与输入相对应的嵌入。

您可以将tf.nn.sampled_softmax_loss()视为单层神经网络，其输入大小为embedding_size，输出大小为1 + num_sampled。 softmax_weights和softmax_biases会对真实和抽样标签进行抽样（使用embedding_lookup）。

因为在word2vec中你的输入和输出标签是相同的，所以softmax_weights中对应于一个单词的向量可以被认为是该单词的附加上下文嵌入。输入和输出对应于相同的词汇表不是必需的。

这样做的目的是加快培训。如果您的词汇量为100万字，嵌入大小为10，则每步需要更新2e7个权重（第1层为1百万x 10，第2层为10 x 100万）。对于负抽样，如果您采样10个负标签，则只需为每个样本更新120个权重（10个暗淡嵌入+（1个正数+10个负数）* 10个）。

sampling_softmax_loss如何知道softmax嵌入矩阵使用哪种嵌入？

1 个答案: