我正在从python gensim库运行LDAMulticore,该脚本似乎无法创建多个线程。这是错误:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 97, in worker
initializer(*initargs)
File "/usr/lib64/python2.7/site-packages/gensim/models/ldamulticore.py", line 333, in worker_e_step
worker_lda.do_estep(chunk) # TODO: auto-tune alpha?
File "/usr/lib64/python2.7/site-packages/gensim/models/ldamodel.py", line 725, in do_estep
gamma, sstats = self.inference(chunk, collect_sstats=True)
File "/usr/lib64/python2.7/site-packages/gensim/models/ldamodel.py", line 655, in inference
ids = [int(idx) for idx, _ in doc]
TypeError: 'int' object is not iterable
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 765, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
pool._maintain_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 229, in _maintain_pool
self._repopulate_pool()
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 222, in _repopulate_pool
w.start()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
self._popen = Popen(self)
File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
我正在这样创建我的LDA模型:
ldamodel = LdaMulticore(corpus, num_topics=50, id2word = dictionary, workers=3)
我实际上已经问过有关此脚本的另一个问题,因此可以在此处找到完整的脚本:
Gensim LDA Multicore Python script runs much too slow
如果相关的话,我正在CentOS服务器上运行它。让我知道是否应包括其他信息。
感谢您的帮助!
答案 0 :(得分:1)
OSError: [Errno 12] Cannot allocate memory
听起来好像RAM用完了。
检查可用内存并进行交换。
您可以尝试使用workers
参数减少线程数,或者使用chunksize
参数减少每个训练块中要使用的文档数。