gensim:KeyError:“词汇中没有单词'good'”

时间:2019-03-11 04:47:01

标签: python-3.x gensim

我正在运行以下代码,但是gensim word2vec抛出的单词不属于词汇错误。你能告诉我解决方法吗?

这是我的文件(file.txt)

'intrepid', 'bumbling', 'duo', 'deliver', 'good', 'one', 'better', 'offering', 'considerable', 'cv', 'freshly', 'qualified', 'private', ..

这是我的代码

 import gensim 
    with open('file.txt', 'r') as myfile:
      data = myfile.read()



    model = gensim.models.Word2Vec(data,min_count=1,size=32)
    w1 = "good"
    model.wv.most_similar (positive=w1)

输出:

KeyError: "word 'good' not in vocabulary"


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-34-22572d5a8082> in <module>()
      7 model = gensim.models.Word2Vec(data,min_count=1,size=32)
      8 w1 = "good"
----> 9 model.wv.most_similar (positive=w1)

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in most_similar(self, positive, negative, topn, restrict_vocab, indexer)
    529                 mean.append(weight * word)
    530             else:
--> 531                 mean.append(weight * self.word_vec(word, use_norm=True))
    532                 if word in self.vocab:
    533                     all_words.add(self.vocab[word].index)

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in word_vec(self, word, use_norm)
    450             return result
    451         else:
--> 452             raise KeyError("word '%s' not in vocabulary" % word)
    453 
    454     def get_vector(self, word):

KeyError: "word 'good' not in vocabulary"

1 个答案:

答案 0 :(得分:1)

import gensim
data=[]
with open('lastlast.txt', 'r') as myfile:
  raw_data = myfile.read()
  raw_data=raw_data.replace('\n',',')
  split_data=raw_data.split(',')
  data=[i.replace("\'",'').replace(' ','') for i in split_data if i!=""]

第一个参数应该是可迭代的。由于数据只是句子的可迭代项,因此它占用每个字符,但[数据]占用每个单词。 来自文档

>>> model = gensim.models.Word2Vec([data],min_count=1,size=32)
>>> model = Word2Vec.load("word2vec.model")
>>> model.train([["hello", "world"]], total_examples=1, epochs=1)

您的解决方案:- 现在,如果您这样做,您将得到答案。

>>>model.most_similar(['good'])