为什么TensorBoard显示错误的余弦距离?

时间:2019-07-12 15:07:39

标签: python tensorboard word-embedding

我想从TensorBoard中可视化投影仪中的单词嵌入,但是余弦距离似乎不正确。

如果我通过sklearn计算余弦距离,则会得到不同的结果。

我使用TensorBoard投影仪错了吗?

TensorBoard: https://i.imgur.com/2hRtXym.png

Sklearn: https://i.imgur.com/49OaiEU.png

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.contrib.tensorboard.plugins import projector

LOG_DIR = 'logs'
metadata = os.path.join(LOG_DIR, 'metadata.tsv')

emb_arr = []

arr = []

# category -> dictionary
# category["Category 1"] -> array([[...,...,...,...,]]) # 300 dimensions

for category in category_embeddings:
    arr.appendcategory_embeddings[category][0]) 
embds_arr = np.asarray(arr)

with open(metadata, 'w', encoding="utf-8") as metadata_file:
    for key in category_embeddings.keys():
        metadata_file.write(key + "\n")

embds = tf.Variable(embds_arr, name='embeds')

with tf.Session() as sess:  
    saver = tf.train.Saver([embds])

    sess.run(embds.initializer)
    saver.save(sess, os.path.join(LOG_DIR, 'category.ckpt'))

    config = projector.ProjectorConfig()    
    config.model_checkpoint_path = os.path.join(LOG_DIR, 'checkpoint')

    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embds.name
    embedding.metadata_path = metadata

    projector.visualize_embeddings(tf.summary.FileWriter(LOG_DIR), config)

1 个答案:

答案 0 :(得分:0)

已解决,

我用不同的数据集和训练周期对其进行了测试,这似乎是TensorBoard中的错误。 Sklearn对原始向量空间返回正确的重用,TensorBoard可能会从降维计算距离。

https://github.com/tensorflow/tensorboard/issues/2421