麻烦运行gensim Word2Vec

时间:2018-03-12 06:34:35

标签: word2vec gensim

我正在尝试使用gensim库在我自己的数据集上训练单词嵌入(word2vec)。

model = Word2Vec(sentences=alp[:20],size=100, window=6, min_count=5) 其中alp是一个列表,其中包含我的语料库中单个句子的标记。

每当我尝试训练w2v模型时,我都会收到以下错误。请帮助。

`Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 867, in worker_loop
    tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
  File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 785, in _do_train_job
    tally += train_batch_cbow(self, sentences, alpha, work, neu1, 
self.compute_loss)
  File "gensim/models/word2vec_inner.pyx", line 458, in gensim.models.word2vec_inner.train_batch_cbow (./gensim/models/word2vec_inner.c:5642)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

`Exception in thread Thread-1:
    Traceback (most recent call last):
      File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
        self.run()
      File "/usr/lib/python3.5/threading.py", line 862, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 867, in worker_loop
        tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
      File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 785, in _do_train_job
        tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
      File "gensim/models/word2vec_inner.pyx", line 458, in gensim.models.word2vec_inner.train_batch_cbow (./gensim/models/word2vec_inner.c:5642)
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`

`

2 个答案:

答案 0 :(得分:1)

通过将alp类型转换为列表列表来解决问题。

答案 1 :(得分:0)

以上代码对我来说非常适合。您可以验证alp[:20]的类型吗 工作代码(在gensim version 3.4.0中测试):

from gensim.models.word2vec import Word2Vec  
model = Word2Vec(sentences=alp[0:20],size=100,window=6,min_count=5)

alp如下所示:

    alp = [['this','is','first','sentence'],
          ['this','is','second','sentence'],
          [..],  
          [..],  
          [..]]