Question

我正在尝试解决深度学习文本分类问题，因此我必须使用Word2Vec对文本输入进行矢量化，以将其提供给神经网络。

所以我下载了Google预训练的Word2Vec模型：https://github.com/3Top/word2vec-api

使用gensim加载它：

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('Word2Vec.bin', binary=True)

当我尝试打印特定单词时：

print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1

为什么我不为一个单词获取一个向量？这不是重点吗？

奖金问题：我可以将Google预训练过的Word2Vec模型中的3M单词向量加载到MongoDB数据库中吗？（列：id - word（字符串） - vector（float））。因为从.bin或.txt文件加载模型需要一分钟。

Answer 1

When I try to print a specific word:

print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1
Why don't I just get one vector for one word? Isn't that the point?

＆＃34;在-1和1之间有数百个浮点数的数组＆＃34;是一个单词向量。

为什么要在调用矢量时使用scala（0.47385435）？

您需要阅读：https://www.tensorflow.org/tutorials/word2vec

如何使用Word2Vec为单个单词获取单个向量？

1 个答案: