如何使用Word2Vec为单个单词获取单个向量?

时间:2017-03-18 20:29:12

标签: python deep-learning word2vec

我正在尝试解决深度学习文本分类问题,因此我必须使用Word2Vec对文本输入进行矢量化,以将其提供给神经网络。

所以我下载了Google预训练的Word2Vec模型:https://github.com/3Top/word2vec-api

使用gensim加载它:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('Word2Vec.bin', binary=True)

当我尝试打印特定单词时:

print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1

为什么我不为一个单词获取一个向量?这不是重点吗?

奖金问题:我可以将Google预训练过的Word2Vec模型中的3M单词向量加载到MongoDB数据库中吗? (列:id - word(字符串) - vector(float))。因为从.bin或.txt文件加载模型需要一分钟。

1 个答案:

答案 0 :(得分:1)

When I try to print a specific word:

print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1
Why don't I just get one vector for one word? Isn't that the point?

"在-1和1之间有数百个浮点数的数组"是一个单词向量。

为什么要在调用矢量时使用scala(0.47385435)?

您需要阅读:https://www.tensorflow.org/tutorials/word2vec