我正在关注此博客,尝试使用gensim在Wikipedia语料库上训练doc2vec。 https://markroxor.github.io/gensim/static/notebooks/doc2vec-wikipedia.html。我正在使用Python 3.6.4和gensim 3.7.3运行。但是,在运行
时出现此错误models = [
Doc2Vec(dm=0, dbow_words=1, size=200, window=8, min_count=19, iter=10, workers=cores)
Doc2Vec(dm=1, dm_mean=1, size=200, window=8, min_count=19, iter =10, workers=cores)]
models[0].build_vocab(documents)
我得到的错误是:
Process InputQueue-16:
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/process.py",line 258, in _bootstrap
self.run()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/utils.py", line 1218, in run
wrapped_chunk = [list(chunk)]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 676, in <genexpr>
((text, self.lemmatize, title, pageid, tokenization_params)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 424, in extract_pages
for elem in elems:
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 409, in <genexpr>
elems = (elem for _, elem in iterparse(f, events=("end",)))
File "/home/ubuntu/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
data = source.read(16 * 1024)
File "/home/ubuntu/anaconda3/lib/python3.6/bz2.py", line 182, in read
return self._buffer.read(size)
File "/home/ubuntu/anaconda3/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/ubuntu/anaconda3/lib/python3.6/_compression.py", line 97, in read
rawblock = self._fp.read(BUFFER_SIZE)
OSError: [Errno 5] Input/output error
我什至不知道从哪里开始修复该错误。收到错误消息后,python似乎继续运行。我不确定是卡在循环中还是抛出异常并且代码仍然可以运行
感谢您的帮助。谢谢!