使用SimLex-999评估word2vec模型

时间:2019-08-26 06:31:05

标签: python-3.x gensim word2vec

我已经用Gensim训练了我的模型。现在我想用simlexx-999评估我的模型,但这给了我错误。 我的代码。

model.wv.evaluate_word_analogies('SimLex-999.txt')
2019-08-25 13:43:22,766 : INFO : Evaluating word analogies for top 300000 words in the model on SimLex-999.txt

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-60cb96c45579> in <module>()
----> 1 model.wv.evaluate_word_analogies('SimLex-999.txt')

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_analogies(self, analogies, restrict_vocab, case_insensitive, dummy4unknown)
   1088             else:
   1089                 if not section:
-> 1090                     raise ValueError("Missing section header before line #%i in %s" % (line_no, analogies))
   1091                 try:
   1092                     if case_insensitive:

ValueError: Missing section header before line #0 in SimLex-999.txt

我尝试过

from gensim.test.utils import datapath

similarities = model.evaluate_word_pairs(datapath('SimLex-999.txt'))

print(similarities)

但是它给了我keyError。请帮助我解决问题。

KeyError                                  Traceback (most recent call last)
<ipython-input-29-caeb682cb7ff> in <module>()
      1 from gensim.test.utils import datapath
      2 
----> 3 similarities = model.wv.evaluate_word_pairs(datapath('SimLex-999.txt'),dummy4unknown=True)
      4 
      5 print(similarities)

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in evaluate_word_pairs(self, pairs, delimiter, restrict_vocab, case_insensitive, dummy4unknown)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

C:\ProgramData\Anaconda3\lib\site-packages\gensim\models\keyedvectors.py in <listcomp>(.0)
   1287 
   1288         """
-> 1289         ok_vocab = [(w, self.vocab[w]) for w in self.index2word[:restrict_vocab]]
   1290         ok_vocab = {w.upper(): v for w, v in reversed(ok_vocab)} if case_insensitive else dict(ok_vocab)
   1291 

KeyError: 'movie'

1 个答案:

答案 0 :(得分:0)

SimLex-999.txt似乎不是作为evaluate_word_analogies()函数的自变量的单词比喻列表。

您是否尝试过evaluate_word_pairs()函数?其说明位于:

https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.evaluate_word_pairs