我正在尝试使用gensim库在我自己的数据集上训练单词嵌入(word2vec)。
model = Word2Vec(sentences=alp[:20],size=100, window=6, min_count=5)
其中alp是一个列表,其中包含我的语料库中单个句子的标记。
每当我尝试训练w2v模型时,我都会收到以下错误。请帮助。
`Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 867, in worker_loop
tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 785, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1,
self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 458, in gensim.models.word2vec_inner.train_batch_cbow (./gensim/models/word2vec_inner.c:5642)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`
`Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 867, in worker_loop
tally, raw_tally = self._do_train_job(sentences, alpha, (work, neu1))
File "/usr/local/lib/python3.5/dist-packages/gensim/models/word2vec.py", line 785, in _do_train_job
tally += train_batch_cbow(self, sentences, alpha, work, neu1, self.compute_loss)
File "gensim/models/word2vec_inner.pyx", line 458, in gensim.models.word2vec_inner.train_batch_cbow (./gensim/models/word2vec_inner.c:5642)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()`
`
答案 0 :(得分:1)
通过将alp类型转换为列表列表来解决问题。
答案 1 :(得分:0)
以上代码对我来说非常适合。您可以验证alp[:20]
的类型吗
工作代码(在gensim version 3.4.0
中测试):
from gensim.models.word2vec import Word2Vec
model = Word2Vec(sentences=alp[0:20],size=100,window=6,min_count=5)
alp
如下所示:
alp = [['this','is','first','sentence'],
['this','is','second','sentence'],
[..],
[..],
[..]]