Question

我所看到的Word2Vec的所有张量流实现都在负采样softmax函数中存在偏差，包括在官方tensorflow网站上

https://www.tensorflow.org/tutorials/word2vec#vector-representations-of-words

loss = tf.reduce_mean(
  tf.nn.nce_loss(weights=nce_weights,
                 biases=nce_biases,
                 labels=train_labels,
                 inputs=embed,
                 num_sampled=num_sampled,
                 num_classes=vocabulary_size))

这是来自Google的免费深度学习课程https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

 loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

然而，从Andrew Ng和Richard Socher的讲座中，他们的负抽样softmax都没有偏差。

即使这个想法起源于此，Mikolov也表示：

偏差不用于神经网络，因为没有重要意义观察到性能改善 - 遵循奥卡姆剃刀，解决方案就像它需要的那样简单。

Mikolov，T。：基于神经网络的统计语言模型，p。 29 http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf

那么为什么官方张量流实现有偏差，为什么似乎没有选择不在sampling_softmax_loss函数中包含偏差？

Answer 1

exercise you link将softmax_biases定义为零：

softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))

即：他们在word2vec示例中没有使用任何实际偏见。

sampled_softmax_loss()函数是通用的，用于许多神经网络;它要求biases参数的决定与一个特定神经网络应用程序（word2vec）的最佳选择无关，并且通过允许（如此处）所有零来适应word2vec情况。

为什么Tensorflow的sampled_softmax_loss迫使你使用偏见，专家建议不要对Word2Vec使用偏见？

1 个答案: