我正在尝试解决深度学习文本分类问题,因此我必须使用Word2Vec对文本输入进行矢量化,以将其提供给神经网络。
所以我下载了Google预训练的Word2Vec模型:https://github.com/3Top/word2vec-api
使用gensim加载它:
import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('Word2Vec.bin', binary=True)
当我尝试打印特定单词时:
print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1
为什么我不为一个单词获取一个向量?这不是重点吗?
奖金问题:我可以将Google预训练过的Word2Vec模型中的3M单词向量加载到MongoDB数据库中吗? (列:id - word(字符串) - vector(float))。因为从.bin或.txt文件加载模型需要一分钟。
答案 0 :(得分:1)
When I try to print a specific word:
print(model['cat'])
# => expected output: 0.47385435 (or something)
# => actual output: array with hundreds of floats between -1 and 1
Why don't I just get one vector for one word? Isn't that the point?
"在-1和1之间有数百个浮点数的数组"是一个单词向量。
为什么要在调用矢量时使用scala(0.47385435)?