在gensim python中使用google word2vec .bin文件

时间:2013-10-11 09:58:15

标签: python gensim word2vec

我试图通过将来自google word2vec网站(freebase-vectors-skipgram1000.bin.gz)的预训练.bin文件加载到word2vec的gensim实现中来开始。模型加载正常,

使用..

model = word2vec.Word2Vec.load_word2vec_format('...../free....-en.bin', binary= True)

并创建一个

>>> print model
<gensim.models.word2vec.Word2Vec object at 0x105d87f50>

但是当我运行最相似的功能时。它无法找到词汇中的单词。我的错误代码如下。

我出错的任何想法?

>>> model.most_similar(['girl', 'father'], ['boy'], topn=3)
2013-10-11 10:22:00,562 : WARNING : word ‘girl’ not in vocabulary; ignoring it
2013-10-11 10:22:00,562 : WARNING : word ‘father’ not in vocabulary; ignoring it
2013-10-11 10:22:00,563 : WARNING : word ‘boy’ not in vocabulary; ignoring it
Traceback (most recent call last):
File “”, line 1, in
File “/....../anaconda/python.app/Contents/lib/python2.7/site-packages/gensim-0.8.7/py2.7.egg/gensim/models/word2vec.py”, line 312, in most_similar
raise ValueError(“cannot compute similarity with no input”)
ValueError: cannot compute similarity with no input

2 个答案:

答案 0 :(得分:7)

'..... / free ....- en.bin'中的单词具有

的形式
  

烯/ boardwalk_chapel   EN / mutsu_munemitsu   en / goffstown en / yaw_axis   EN / john_e_fogarty_international_center   EN / francielle_manoel_alberto   烯/ shinji_harada

所以,当你寻找'女孩'时,它就不存在了

答案 1 :(得分:2)

为了扩大一点塞尔吉奥的答案,&#34;&#34;&#34;&#34;实际上是Freebase标识符,所以&#34; girl&#34;由/en/girl(对于freebase-vectors-skipgram1000-en.bin.gz)或其MID等效/m/05r655(对于freebase-vectors-skipgram1000.bin.gz)表示

https://www.freebase.com/m/05r655

https://www.freebase.com/en/girl