gensim.LDAMulticore抛出异常:

时间:2019-01-30 03:02:01

标签: python gensim multicore lda

我正在从python gensim库运行LDAMulticore,该脚本似乎无法创建多个线程。这是错误:

  Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 97, in worker
    initializer(*initargs)
  File "/usr/lib64/python2.7/site-packages/gensim/models/ldamulticore.py", line 333, in worker_e_step
    worker_lda.do_estep(chunk)  # TODO: auto-tune alpha?
  File "/usr/lib64/python2.7/site-packages/gensim/models/ldamodel.py", line 725, in do_estep
    gamma, sstats = self.inference(chunk, collect_sstats=True)
  File "/usr/lib64/python2.7/site-packages/gensim/models/ldamodel.py", line 655, in inference
    ids = [int(idx) for idx, _ in doc]
TypeError: 'int' object is not iterable
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 765, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 325, in _handle_workers
    pool._maintain_pool()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 229, in _maintain_pool
    self._repopulate_pool()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 222, in _repopulate_pool
    w.start()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 130, in start
    self._popen = Popen(self)
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 121, in __init__
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

我正在这样创建我的LDA模型:

ldamodel = LdaMulticore(corpus, num_topics=50, id2word = dictionary, workers=3)

我实际上已经问过有关此脚本的另一个问题,因此可以在此处找到完整的脚本:

Gensim LDA Multicore Python script runs much too slow

如果相关的话,我正在CentOS服务器上运行它。让我知道是否应包括其他信息。

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

OSError: [Errno 12] Cannot allocate memory听起来好像RAM用完了。

检查可用内存并进行交换。

您可以尝试使用workers参数减少线程数,或者使用chunksize参数减少每个训练块中要使用的文档数。