OSError:[Errno 5]运行gensim Wiki doc2vec时输入/输出错误

时间:2019-05-16 00:20:46

标签: python-3.x gensim

我正在关注此博客,尝试使用gensim在Wikipedia语料库上训练doc2vec。 https://markroxor.github.io/gensim/static/notebooks/doc2vec-wikipedia.html。我正在使用Python 3.6.4和gensim 3.7.3运行。但是,在运行

时出现此错误
models = [
Doc2Vec(dm=0, dbow_words=1, size=200, window=8, min_count=19, iter=10, workers=cores)
Doc2Vec(dm=1, dm_mean=1, size=200, window=8, min_count=19, iter =10, workers=cores)]
models[0].build_vocab(documents)

我得到的错误是:

Process InputQueue-16:

Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.6/multiprocessing/process.py",line 258, in _bootstrap
self.run()
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/utils.py", line 1218, in run
wrapped_chunk = [list(chunk)]
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 676, in <genexpr>
((text, self.lemmatize, title, pageid, tokenization_params)
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 424, in extract_pages
for elem in elems:
File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 409, in <genexpr>
elems = (elem for _, elem in iterparse(f, events=("end",)))
File "/home/ubuntu/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1223, in iterator
data = source.read(16 * 1024)
File "/home/ubuntu/anaconda3/lib/python3.6/bz2.py", line 182, in read
return self._buffer.read(size)
File "/home/ubuntu/anaconda3/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/ubuntu/anaconda3/lib/python3.6/_compression.py", line 97, in read
rawblock = self._fp.read(BUFFER_SIZE)

OSError: [Errno 5] Input/output error

我什至不知道从哪里开始修复该错误。收到错误消息后,python似乎继续运行。我不确定是卡在循环中还是抛出异常并且代码仍然可以运行

感谢您的帮助。谢谢!

0 个答案:

没有答案